Thursday, May 30, 2024
Profile results are as expected as the Spanish Inquisition
I'm not upset at rewriting the code that handles the network buffering as it needed work, but I'm still seeing a disporportionate amount of CPU time accumluate on the supposedly simpler protocol gopher. The most popular requests of both my gopher server and Gemini server are entries from my blog, so I take a look at the code that handles such requests for both servers. Yes, the gohper server has a bit more code dealing with links than the Gemini server (because gopher URLs are almost, but not entirely, like http URLs—and the small differences are annoying), but I'm not seeing anything that stands out. Yes, the code is not quite twice as much, but the CPU utilization is more than three times as much (as of writing this).
I have no other choice at this point and I constantly relearn this lession over and over again: if I'm looking into a performance issue, profile the code under question! Profile profile profile!
The code is in Lua and as it happens, I've profiled Lua code before. First, I want to answer this question: how much code does it take to serve a request? And I figure measuring the lines of code run is a good answer to that. I can get a baseline from that. And the code to answer that is a very easy four line change to each server:
local function main(iostream) local count = 0 debug.sethook(function() count = count + 1 end,'line') -- The rest of the main code debug.sethook() syslog('notice',"count=%d",count) end
I fire up the servers locally, make a decently sized request to each, and I get my results:
gopher | 457035 |
gemini | 22661 |
WHAT THE LITERAL XXXX‽
[Well, there's your problem! —Editor] [Just … gaaaaaaah! —Sean]
I'm constantly surprised at the results of profiling—it's almost never what I think the problem is. And here, it's clear that I messed up pretty bad somewhere in the gopher code.
Now off to more profiling to see where it all goes pear shaped.
Unicode. Why did it have to be Unicode?
Well, I have my answer. I first found a smaller request that exhibits the behavior as to not generate half a million lines of output:
gopher | 2549 |
gemini | 564 |
Good. Two and a half thousand lines of code is tractable. Now, just to show how easy it is to profile Lua code, here's the code I'm using for my gopher server:
local profile = {} local function doprofile() local info = debug.getinfo(2) local name = info.name or "..." local file = info.source or "@" local key = string.format("%s$%s(%d)",file,name,info.currentline) if not profile[key] then profile[key] = 1 else profile[key] = profile[key] + 1 end end
For each line of code executed, we get the filename, the function name and the line of code that's executing, turn that into a key, and use that to count the number of times that line of code is executed. Easy. And then some code to dump the results:
local function cleanup() local results = {} for name,value in pairs(profile) do results[#results + 1] = { file = name , count = value } end table.sort(results,function(a,b) if a.count > b.count then return true elseif a.count < b.count then return false else return a.file < b.file end end) local f = io.open("/tmp/dump.txt","w") for i = 1 , #results do f:write(string.format("%6d %s\n",results[i].count,results[i].file)) end f:close() end
We sort the results based on line count, then alphabetically by key. And like before:
local function main(iostream) debug.sethook(doprofile,'line') -- The rest of the main code debug.sethook() cleanup() end
I make the request and get some results:
215 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(202) 211 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(203) 211 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(204) 211 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(268) 210 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(282) 210 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(283) 169 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(219) 169 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(224) 169 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(239) 169 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(240) 42 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(205) 42 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(206) 42 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(207) 17 @port70.lua$...(272) 17 @port70.lua$...(273) 9 @/usr/local/share/lua/5.4/org/conman/net/ios.lua$write(547) ...
Oh.
Yeah.
That.
Obvious in hindsight.
I completely forgot about that.
Okay.
The function in question,
wrapt()
,
wraps text and it's a rather heavy function
due to Unicode
(and I'm not even following the full specification there).
This is the major difference between the gopher and Gemini servers—I don't wrap text for Gemini
(the clients handle that).
I guess I'll have to drop down to C if I want to speed this up.
Sigh.