The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Wednesday, June 12, 2024

Just when I thought it was safe to run a gopher server

It appeared I had traded the problem of high CPU usage for a memory leak. Last night I saw:

USER       PID %CPU %MEM     VSZ     RSS TTY   STAT START TIME COMMAND
gopher   31875  0.2 13.3 2174076 2171960 ttyp4 S+   Jun09 7:08 lua port70.lua /home/spc/gopher/config.lua

(For the record—the VSZ and RSS values are the number of 4,096 byte blocks of memory for various reasons beyond the scope of this post) and just before lunch today:

USER       PID %CPU %MEM     VSZ     RSS TTY   STAT START TIME COMMAND
gopher    1572  0.2 11.1 1809672 1807644 ttyp4 S+   04:30 1:07 lua port70.lua /home/spc/gopher/config.lua

Not good. Especially when it's normally like:

USER       PID %CPU %MEM   VSZ  RSS TTY   STAT START TIME COMMAND
gopher   20711  0.2  0.0 10588 9000 ttyp4 S+   17:52 0:06 lua port70.lua /home/spc/gopher/config.lua

And then there was this in the logs:

Jun 12 09:47:54	daemon	err	71.19.142.20	gopher	CRASH: coroutine thread: 0x8400404 dead: not enough memory
Jun 12 09:47:54	daemon	err	71.19.142.20	gopher	CRASH: thread: 0x8400404: stack traceback:
Jun 12 09:47:54	daemon	err	71.19.142.20	gopher	CRASH: thread: 0x8400404:  [C]: in function 'org.conman.string.wrapt'
Jun 12 09:47:54	daemon	err	71.19.142.20	gopher	CRASH: thread: 0x8400404:  ...e/lua/5.4/org/conman/app/port70/handlers/blog/format.lua:34: in upvalue 'wrap_text'
Jun 12 09:47:54	daemon	err	71.19.142.20	gopher	CRASH: thread: 0x8400404:  ...e/lua/5.4/org/conman/app/port70/handlers/blog/format.lua:119: in upvalue 'run_flow'
Jun 12 09:47:54	daemon	err	71.19.142.20	gopher	CRASH: thread: 0x8400404:  ...e/lua/5.4/org/conman/app/port70/handlers/blog/format.lua:598: in function 'org.conman.app.port70.handlers.blog.format'
Jun 12 09:47:54	daemon	err	71.19.142.20	gopher	CRASH: thread: 0x8400404:  ...al/share/lua/5.4/org/conman/app/port70/handlers/blog.lua:227: in function <...al/share/lua/5.4/org/conman/app/port70/handlers/blog.lua:210>
Jun 12 09:47:54	daemon	err	71.19.142.20	gopher	CRASH: thread: 0x8400404:  (...tail calls...)
Jun 12 09:47:54	daemon	err	71.19.142.20	gopher	CRASH: thread: 0x8400404:  port70.lua:238: in function <port70.lua:205>

Yeah, “not enough memory” while running the new C-based text wrapping function. I added a few checks to see if wrapt() wasn't updating the character indicies correctly (and log if they weren't) then went for lunch. Afterwards, I check and lo', memory was still growing, and none of the checks I added had triggered.

Hmmm. Time to take down the gopher server and debug some code.

Sure enough, I found the one entry that went off the rails. What followed was a few hours of troubleshooting to find out why that one entry (and only that entry) blows up. And it came down to a difference of semantics between Lua and C (no, it had nothing to do with C's 0-based indexing and Lua's 1-based indexing). Basically, the Lua code looked like:

local breakhere -- this is nil

...

if ctype == 'space' then
  breakhere = i -- this is now an integer
  ...
end

if breakhere then
  table.insert(result,s:sub(front,breakhere - 1))
  ...
end

breakhere = nil

In Lua, only the values nil and false are considered “false”—a 0 value is “true” in Lua. In C, any 0 value is considered “false.” I knew this when translating the code, but it never occurred to me that a break point of a line could be at the start of a line.

Until it was at the start of a line in that one entry—the code went into an infinite loop trying to wrap text, thus causing the memory usage to consume the computer.

Sigh.

It was easy enough to fix once the problem was diagnosed. And maybe now things will get back to normal.

Thursday, June 06, 2024

Stats on some optimized small internet servers

I restarted my Gopher and Gemini servers about 25 hours ago to ensure they're both running the new optimized code, so now let's check some results:

Gopher/Gemini server stats
service CPU utilization requests bytes
gopher 2:42 935 1186591
gemini 8:58 36394 249020812

That's more like it. The gopher server, running over plain TCP and getting about 1/40th the requests, is finally accumulating less CPU time than Gemini (and the only reason it's not even less is my gopher server has to deal with wraping Unicode text).

I also find it amusing that Gemini, a protocol that has only been around for five years, is way more popular than gopher, a protocol that's been around for thirty-three years. I guess Gemini being the new shiny is the reason.

Wednesday, June 05, 2024

I wonder who isn't getting the point

Yes, there are a few groups I follow on Discord. One of the groups has a link to another site I want to view and I click on it. I get:

Leaving Discord

This link is taking you to the following website

https://github.com/XXXXX­XXXXX­XXXXX­XXXXX­XXXXX­XXXXX­X

Trust github.com links from now on

God, I just love XXXXX­XX wall gardens! I just love how it's there to protect me. You mean … a link … goes to a non-Discord site? Oh my God! Thank you for warning me! My computer might have been pwned by Microsoft!

Sheesh.

Monday, June 03, 2024

Just a simple matter of replacing a slow Lua function with a faster C function

I spent the past few days rewriting some Lua code into C. While I find LPEG to be convenient, it is not necessarily fast. Normally this isn't an issue but in this case, I was calling LPEG for each character in a blog post.

Fortunately, it was fairly straight forward porting the code to C. The code goes through text a character codepoint at a time. If it's a whitespace character or a hyphen, I mark the current position as a possible breakpoint for the text; otherwise I ignore combining characters (they don't count towards the line length). Then, when I reach past the number of characters I want for a line, I copy out the string from the beginning of the “line” to the marked breakpoint (and if there isn't a breakpoint, there is no good place to break the line so I will break the line at the line length—not much else to do), then mark the beginning of the next line and continue until the end of the text.

The hardest part was figuring out how to classify each character I needed. In the end, I pull out each Unicode codepoint from UTF-8 and look through an array to classify the codepoint as whitespace, a hyphen or a combining character; if they aren't in the table, it just a normal character.

As a sanity check, I reran the original profiling test:

Lines of Lua code executed to serve a request
gopher (original) 457035
gopher (new) 18246
gemini (just because) 22661

Much better. And most of the 457,035 lines of code being executed are now hidden behind C. Now to make sure the code is actually faster, I profiled the new wrapt() function:

local wraptx = wrapt
local function wrapt(...)
  local start = rdtsc()
  local res   = wraptx(...)
  local stop  = rdtsc()
  syslog('notice',"wrapt()=%d",stop-start)
  return res
end

with the decently sized request I used before (each line is a call to wrapt()):

Runtime (lower is better)
#Lua code C code
43330 11810
43440 12000
45300 12220
48100 12020
48680 13690
49260 12650
54140 12270
54650 12460
58530 12130
59760 14180
61100 15480
65440 14970
67920 15810
68750 15310
69920 17170
69960 17780
70740 16510
75640 16750
78870 19170
83200 18190
87090 17290
89070 23360
91440 19560
101800 21520
102460 21060
103790 22180
106000 22400
106010 21870
112960 21160
115300 21870
115980 23130
118690 24980
122550 23960
122710 24550
127610 23830
129580 24670
130120 24930
140580 26570
141930 25210
157640 27050
168000 32250

Excellent! The new code is three to five times faster. Now to just sit back and see how the new code fares over the next few days.

Saturday, June 01, 2024

Falettinme Be Mice Elf Ag

Bunny and I were in my car when this song from the Sly and the Family Stone came on. On the screen was the title, “Thank You (Falettinme Be Mice Elf Ag)”. Bunny attempted to read it and was having a hard time with it; I suggested that perhaps XM Radio had used some form of audio transcription that went wrong somewhere, because Sly and the Family were clearly singing “thank you for letting me be myself.”

I was about to make a satirical post about it, but when I was looking up the song … well … that's the actual name! Okay, it's actually “Thank You (Falettinme Be Mice Elf Agin)” (the screen wasn't wide enough for the entire title). But still … that's the actual name of the song! Sheesh!

Thursday, May 30, 2024

Unicode. Why did it have to be Unicode?

Well, I have my answer. I first found a smaller request that exhibits the behavior as to not generate half a million lines of output:

Lines of code executed for a small request
gopher 2549
gemini 564

Good. Two and a half thousand lines of code is tractable. Now, just to show how easy it is to profile Lua code, here's the code I'm using for my gopher server:

local profile = {}

local function doprofile()
  local info = debug.getinfo(2)
  local name = info.name or "..."
  local file = info.source or "@"
  local key  = string.format("%s$%s(%d)",file,name,info.currentline)
  if not profile[key] then
    profile[key] = 1
  else
    profile[key] = profile[key] + 1
  end
end

For each line of code executed, we get the filename, the function name and the line of code that's executing, turn that into a key, and use that to count the number of times that line of code is executed. Easy. And then some code to dump the results:

local function cleanup()
  local results = {}
  for name,value in pairs(profile) do
    results[#results + 1] = { file = name , count = value }
  end
  
  table.sort(results,function(a,b)
    if a.count > b.count then
      return true
    elseif a.count < b.count then
      return false
    else
      return a.file < b.file
    end
  end)
  
  local f = io.open("/tmp/dump.txt","w")
  for i = 1 , #results do
    f:write(string.format("%6d %s\n",results[i].count,results[i].file))
  end
  f:close()  
end

We sort the results based on line count, then alphabetically by key. And like before:

local function main(iostream)
  debug.sethook(doprofile,'line')

  -- The rest of the main code

  debug.sethook()
  cleanup()
end

I make the request and get some results:

   215 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(202)
   211 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(203)
   211 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(204)
   211 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(268)
   210 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(282)
   210 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(283)
   169 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(219)
   169 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(224)
   169 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(239)
   169 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(240)
    42 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(205)
    42 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(206)
    42 @/usr/local/share/lua/5.4/org/conman/string.lua$wrapt(207)
    17 @port70.lua$...(272)
    17 @port70.lua$...(273)
     9 @/usr/local/share/lua/5.4/org/conman/net/ios.lua$write(547)
...

Oh.

Yeah.

That.

Obvious in hindsight.

I completely forgot about that.

Okay. The function in question, wrapt(), wraps text and it's a rather heavy function due to Unicode (and I'm not even following the full specification there). This is the major difference between the gopher and Gemini servers—I don't wrap text for Gemini (the clients handle that). I guess I'll have to drop down to C if I want to speed this up.

Sigh.


Profile results are as expected as the Spanish Inquisition

I'm not upset at rewriting the code that handles the network buffering as it needed work, but I'm still seeing a disporportionate amount of CPU time accumluate on the supposedly simpler protocol gopher. The most popular requests of both my gopher server and Gemini server are entries from my blog, so I take a look at the code that handles such requests for both servers. Yes, the gohper server has a bit more code dealing with links than the Gemini server (because gopher URLs are almost, but not entirely, like http URLs—and the small differences are annoying), but I'm not seeing anything that stands out. Yes, the code is not quite twice as much, but the CPU utilization is more than three times as much (as of writing this).

I have no other choice at this point and I constantly relearn this lession over and over again: if I'm looking into a performance issue, profile the code under question! Profile profile profile!

The code is in Lua and as it happens, I've profiled Lua code before. First, I want to answer this question: how much code does it take to serve a request? And I figure measuring the lines of code run is a good answer to that. I can get a baseline from that. And the code to answer that is a very easy four line change to each server:

local function main(iostream)
  local count = 0
  debug.sethook(function() count = count + 1 end,'line')

  -- The rest of the main code

  debug.sethook()
  syslog('notice',"count=%d",count)
end

I fire up the servers locally, make a decently sized request to each, and I get my results:

Lines of code to serve a request
gopher 457035
gemini 22661

WHAT THE LITERAL XXXX

[Well, there's your problem! —Editor] [Just … gaaaaaaah! —Sean]

I'm constantly surprised at the results of profiling—it's almost never what I think the problem is. And here, it's clear that I messed up pretty bad somewhere in the gopher code.

Now off to more profiling to see where it all goes pear shaped.

Monday, May 27, 2024

How does TLS use less CPU than plain TCP?

I have two services written in Luaa gopher server and a Gemini server. The both roughly serve the same data (mainly my blog) and yet, the gopher server accumulates more CPU time than the Gemini server, despite that the Gemini server uses TLS and serves more requests. And not by a little bit either:

CPU utilization
gopher 17:26
Gemini 0:45

So I started investigating the issue. It wasn't TCP_NODELAY (via Lobsters) as latency wasn't the issue (but I disabled Nagle's algorithm anyway).

Looking further into the issue, it seemed to be one of buffering. the code was not buffering any data with TCP; furthermore, the code was issuing tons of small writes. My thinking here was—Of course! The TCP code was making tons of system calls, whereas the TLS code (thanks to the library I'm using) must be doing buffering for me.

So I added buffering to the gopher server, and now, after about 12 hours (where I restarted both servers) I have:

new CPU utilization
gopher 2:25
Gemini 2:13

I … I don't know what to make of this. Obviously, things have improved for gopher, but did I somehow make Gemini worse? (I did change some low level code that both TCP and TLS use; I use full buffering for TCP, no buffering for TLS). Is the load more evenly spread?

It's clear that gopher is still accumulating more CPU time, just not as bad as it was. Perhaps more buffering is required? I'll leave this for a few days and see what happens.

Obligatory Picture

[“I am NOT a number, I am … a Q-CODE!”]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.