Sunday, June 30, 2024
Four lines of code … it was four lines of code
I checked last night, and found the following:
CPU | requests | |
---|---|---|
gopher | 28:37 | 8360 |
gemini | 6:02 | 24345 |
How can a TCP service take more time than a TLS service? Especially when TLS runs over TCP? And TLS generates six times the packets than TCP?
There's some difference in how I handle TCP and TLS that's causing this. Profiling the code didn't reveal much as my attempts kept showing that gopher should be using less CPU than Gemini, so then it was just a matter of staring at code and thinking hard.
It wasn't easy. Yes, there are some differences between the TCP interface and the TLS interface, so it was a matter of going through the code, bringing them closer in implementation and seeing if that fixed things. I would fix the code, run a test, see it not make a difference, and try again. Eventually, I came across this bit of code:
-- --------------------------------------------------------------------- -- It might be a bug *somewhere*, but on Linux, this is required to get -- Unix domain sockets to work with the NFL driver. There's a race -- condition where writting data then calling close() may cause the -- other side to receive no data. This does NOT appoear to happen with -- TCP sockets, but this doesn't hurt the TCP side in any case. -- --------------------------------------------------------------------- while self.__socket.sendqueue and ios.__socket.sendqueue > 0 do nfl.SOCKETS:update(self.__socket,'w') coroutine.yield() end
This bit of code wasn't in the TLS implementation, and as the comment says, it “shouldn't” hurt the TCP path, but hey, it was only needed for local (or Unix) domain sockets to begin with, so let me try removing that bit of code and run a test.
CPU | real | user | sys | |
---|---|---|---|---|
gopher | 18 | 46.726 | 13.971 | 4.059 |
gemini | 4 | 107.928 | 3.951 | 0.322 |
CPU | real | user | sys | |
---|---|---|---|---|
gopher | 0 | 52.403 | 0.635 | 0.185 |
gemini | 4 | 103.290 | 3.957 | 0.285 |
THAT'S more like it!
Now, about that code … this is my best guess as to what was going on.
The sendqueue
field is a Linux-only field,
so I'm checking to see if it exists,
and then checking the value
(if only to avoid the “attempt to compare number with nil” Lua error on non-Linux systems).
This returns the number of bytes the kernel has yet to send,
so we switch the socket to trigger an event when we can write to it,
and yield.
Even worse,
the check isn't just a simple variable fetch,
but ends up being an ioctl()
system call.
Twice!
Now the “write” event code:
if event.write then if #ios.__output > 0 then local bytes,err = ios.__socket:send(nil,ios.__output) if err == 0 then ios.__wbytes = ios.__wbytes + bytes ios.__output = ios.__output:sub(bytes + 1,-1) else syslog('error',"socket:send() = %s",errno[err]) nfl.SOCKETS:remove(ios.__socket) nfl.schedule(ios.__co,false,errno[err],err) end end if #ios.__output == 0 then nfl.SOCKETS:update(ios.__socket,'r') nfl.schedule(ios.__co,true) end end
There's probably no data to be sent (ios.__output
is empty) so we immediately switch the socket back to trigger for readability,
then resume the coroutine handling the socket.
It goes back to the while
loop,
setting the socket back to trigger on writability,
and kept bouncing like that until the kernel buffer had been sent.
As I wrote, this fixed an issue I had with local domain sockets—it wasn't an issue with an IP socket, and in fact, was the wrong thing to do for an IP socket.
And as it is with these types of bugs, finding the root cause is not trivial, but the fix almost always is.