The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Monday, Debtember 28, 2020

Yet more adventures in profiling

While I'm now satisfied with the memory usaage, I'm starting to watching the CPU utilization and noticed with some dismay that it's quite high (even with the new HTML parser being faster overall). I started an instance a day ago (the 27TH), and it already has accumulated 27 minutes, 35 seconds of CPU time. As a contrast, the web server has only accumulated 37 seconds of CPU time since the 25TH.

That's a large difference.

The server in question is written in Lua. I have another server written in Lua, and it has only accumulated 1 minute, 26 seconds since the 25TH.

There are two differences that might account for the discrepancy:

  1. one gets more requests than the other;
  2. one uses TLS, the other doesn't.

But to be sure, I need a controlled experiment. Since both servers basically do the same thing (mainly, serve up this blog via gopher and Gemini, and convert the HTML to text formats, thus the need for an HTML parser) it was easy enough generate a list of comparable requests for both, and profile the execution.

Unfortunately, profiling across shared objects doesn't necessarily work all that well (at least on Linux). I recompiled both Lua and all the Lua modules I use (at least the ones written in C) but it only showed the time spent in the main Lua VM and nowhere else.

I then spent the time constructing a self-contained executable (containing Lua, plus all the modules comprising the application) of port70 (the gopher server) and another one for GLV-1.12556 (the Gemini server). Pretty easy to do, if a bit tedious in tracking down all the modules to include in the executables. I didn't bother with any optimizations for these runs, as I'm trying to get a feel for where the time is spent.

I profiled each executable, making the same 1,171 requests (well, “same” meaning “requesting the same content”) to each program.

First, port70, the gopher server, straight TCP connection. It accumulated 14 seconds of CPU time with the profile run, and the results:

Each sample counts as 0.01 seconds.
% time cumulative seconds self seconds calls self ms/call total ms/call name
18.06 0.56 0.56 33881 0.00 0.00 luaV_execute
17.74 1.11 0.55 986744 0.00 0.00 match
4.03 1.24 0.13 28481743 0.00 0.00 lua_gettop
3.55 1.35 0.11 22087107 0.00 0.00 index2value
2.58 1.43 0.08 11321831 0.00 0.00 yymatchChar
2.26 1.50 0.07 6478653 0.00 0.00 touserdata
2.26 1.57 0.07 2063343 0.00 0.00 pushcapture
1.94 1.63 0.06 2074113 0.00 0.00 lua_getmetatable
1.94 1.69 0.06 2068487 0.00 0.00 auxgetstr
1.61 1.74 0.05 2222138 0.00 0.00 luaS_new
1.29 1.78 0.04 5469355 0.00 0.00 luaV_equalobj
1.29 1.82 0.04 5239401 0.00 0.00 luaH_getshortstr
1.29 1.86 0.04 2042852 0.00 0.00 luaL_checkudata
1.29 1.90 0.04 1207086 0.00 0.00 lua_tolstring
1.29 1.94 0.04 1070855 0.00 0.00 luaT_gettmbyobj
1.29 1.98 0.04 175585 0.00 0.00 internshrstr

Nothing terribly surprising there. The function luaV_execute() is not surprising, as that's the main driver for the Lua VM. match() is from LPEG, which is used for all parsing aside from HTML. The function yymatchChar() is from the HTML parser I wrote, so again, no terrible surprise there.

Now, GLV-1.12556, the Gemini server, using TLS. This accumulated 1 minute, 24 seconds of CPU time with the profile run. The results:

Each sample counts as 0.01 seconds.
% time cumulative seconds self seconds calls self ms/call total ms/call name
8.06 0.10 0.10 30070 0.00 0.01 luaV_execute
7.26 0.19 0.09 1494750 0.00 0.00 luaH_getshortstr
5.65 0.26 0.07 11943 0.01 0.01 match
4.03 0.31 0.05 535091 0.00 0.00 luaD_precall
4.03 0.36 0.05 502074 0.00 0.00 moveresults
3.23 0.40 0.04 129596 0.00 0.00 luaS_hash
2.42 0.43 0.03 11321831 0.00 0.00 yymatchChar
2.42 0.46 0.03 4218262 0.00 0.00 yyText
2.42 0.49 0.03 3293376 0.00 0.00 yymatchString
2.42 0.52 0.03 1508070 0.00 0.00 yyrefill
2.42 0.55 0.03 377362 0.00 0.00 luaH_newkey
1.61 0.57 0.02 2898350 0.00 0.00 index2value
1.61 0.59 0.02 2531258 0.00 0.00 lua_gettop
1.61 0.61 0.02 1081241 0.00 0.00 yy_CHAR
1.61 0.63 0.02 230982 0.00 0.00 luaV_equalobj
1.61 0.65 0.02 174295 0.00 0.00 luaV_finishget
1.61 0.67 0.02 136553 0.00 0.00 luaT_gettmbyobj
1.61 0.69 0.02 129534 0.00 0.00 internshrstr
1.61 0.71 0.02 10363 0.00 0.00 traversestrongtable
1.61 0.73 0.02 4684 0.00 0.00 lua_resume

It's satisfying seeing the same number of calls to yymatchChar(), but the times are smaller overall and there's less of a spike with luaV_execute(), leading me to believe the time is actually being spent in TLS. That isn't showing up because I haven't compiled the TLS library to be profiled, and it's dynamically linked in anyway. I'm fairly confident that TLS is sucking the CPU time and it's not necessarily my code. I'm aprehensive about attemping to recompile the TLS library with profiling in mind but it is the next logical step if I want to know for sure.

Sigh.

Obligatory Picture

An abstract representation of where you're coming from]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.