Friday, August 10, 2012
Less traffic and lost packets … I'm stumped
Part of my job at The Corporation is load testing. And generally, when I do load testing, I pretty much write code that sends requests as quickly as possible to the server (and I might run multiple programs that spam the server with requests). But recently, Smirk brought to my attention Poisson distributions, claiming that it presents a more realistic load.
I was skeptical. I really didn't see how randomly sending requests using a Poisson distribution would behave any differently than a (large) constant load, but Smirk insisted.
So I wrote a simple UDP service and ran that on one machine. I then coded up two client programs. They were actually identical except for one line of code—the “constant” client:
sleep(0)
and the “Poisson distribution” client:
sleep(-log(1.0 - random()) / 5000.0)
(sleep(0)
doesn't actually sleep, but does cause the calling
process to be rescheduled. This limits the process to about 10,000 messages
a second, so it's a good baseline, and we can always run more processes.
random()
returns a value between 0 and 1 (includes 0, excludes
1) and we subtract that from 1.0 to prevent taking the log of 0 (which is
undefined). Logarithms of numbers less than 1 are negative, so we negate to
get a positive value, and divide by 5,000, which means we average 5,000
messages per second. Yes, it's half the rate of the constant client, but
there are two reasons for this—one, we can always run more, and two,
there's a limit to short we can sleep—a value of 0 just reschedules the
process; under Solaris, you can't sleep less than 1/100 of a second (so
values less than .01 are rounded up to .01); Linux is around 1/500 or 1/1000
of a second (depending upon configuration) so 5,000 is kind of a “eh, it's
good enough” value)
(I should also mention that the version of sleep()
I'm using
can take a fractional number of seconds, like sleep(0.125)
,
since all the code I'm talking about is in Lua, because I was writing a
“proof-of-concept” server, not a “high performance” server)
So, I run 64 “constant” clients and get:
packets sent | packets received | packets dropped |
---|---|---|
636002 | 636002 | 0 |
621036 | 621036 | 0 |
631890 | 631890 | 0 |
631051 | 631051 | 0 |
613912 | 613912 | 0 |
Pretty much around 10,000 messages per second with no dropped data. And now, for 128 ”Poisson distribution” clients:
packets send | packets received | packets dropped |
---|---|---|
348620 | 348555 | 65 |
439038 | 438988 | 50 |
375482 | 375436 | 46 |
382650 | 382600 | 50 |
396886 | 396828 | 58 |
Um … what?
Half the number of packets, and I'm loosing some as well? What weirdness is this? No matter how many times I run the tests, or for how long, I get similar results. The “Poisson distribution” client gets horrible results.
And as Smirk said, that's exactly the point.
And the odd thing is, I can't explain this behavior. I can't comprehend what could be happening that could be causing this behavior, over one line change.
Disturbing.