Wednesday, June 12, 2024
Just when I thought it was safe to run a gopher server
It appeared I had traded the problem of high CPU usage for a memory leak. Last night I saw:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND gopher 31875 0.2 13.3 2174076 2171960 ttyp4 S+ Jun09 7:08 lua port70.lua /home/spc/gopher/config.lua
(For the record—the VSZ
and RSS
values are the number of 4,096 byte blocks of memory for various reasons beyond the scope of this post)
and just before lunch today:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND gopher 1572 0.2 11.1 1809672 1807644 ttyp4 S+ 04:30 1:07 lua port70.lua /home/spc/gopher/config.lua
Not good. Especially when it's normally like:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND gopher 20711 0.2 0.0 10588 9000 ttyp4 S+ 17:52 0:06 lua port70.lua /home/spc/gopher/config.lua
And then there was this in the logs:
Jun 12 09:47:54 daemon err 71.19.142.20 gopher CRASH: coroutine thread: 0x8400404 dead: not enough memory Jun 12 09:47:54 daemon err 71.19.142.20 gopher CRASH: thread: 0x8400404: stack traceback: Jun 12 09:47:54 daemon err 71.19.142.20 gopher CRASH: thread: 0x8400404: [C]: in function 'org.conman.string.wrapt' Jun 12 09:47:54 daemon err 71.19.142.20 gopher CRASH: thread: 0x8400404: ...e/lua/5.4/org/conman/app/port70/handlers/blog/format.lua:34: in upvalue 'wrap_text' Jun 12 09:47:54 daemon err 71.19.142.20 gopher CRASH: thread: 0x8400404: ...e/lua/5.4/org/conman/app/port70/handlers/blog/format.lua:119: in upvalue 'run_flow' Jun 12 09:47:54 daemon err 71.19.142.20 gopher CRASH: thread: 0x8400404: ...e/lua/5.4/org/conman/app/port70/handlers/blog/format.lua:598: in function 'org.conman.app.port70.handlers.blog.format' Jun 12 09:47:54 daemon err 71.19.142.20 gopher CRASH: thread: 0x8400404: ...al/share/lua/5.4/org/conman/app/port70/handlers/blog.lua:227: in function <...al/share/lua/5.4/org/conman/app/port70/handlers/blog.lua:210> Jun 12 09:47:54 daemon err 71.19.142.20 gopher CRASH: thread: 0x8400404: (...tail calls...) Jun 12 09:47:54 daemon err 71.19.142.20 gopher CRASH: thread: 0x8400404: port70.lua:238: in function <port70.lua:205>
Yeah, “not enough memory” while running the new C-based text wrapping function.
I added a few checks to see if wrapt()
wasn't updating the character indicies correctly
(and log if they weren't)
then went for lunch.
Afterwards,
I check and lo',
memory was still growing,
and none of the checks I added had triggered.
Hmmm. Time to take down the gopher server and debug some code.
Sure enough, I found the one entry that went off the rails. What followed was a few hours of troubleshooting to find out why that one entry (and only that entry) blows up. And it came down to a difference of semantics between Lua and C (no, it had nothing to do with C's 0-based indexing and Lua's 1-based indexing). Basically, the Lua code looked like:
local breakhere -- this is nil ... if ctype == 'space' then breakhere = i -- this is now an integer ... end if breakhere then table.insert(result,s:sub(front,breakhere - 1)) ... end breakhere = nil
In Lua,
only the values nil
and false
are considered “false”—a 0 value is “true” in Lua.
In C,
any 0 value is considered “false.”
I knew this when translating the code,
but it never occurred to me that a break point of a line could be at the start of a line.
Until it was at the start of a line in that one entry—the code went into an infinite loop trying to wrap text, thus causing the memory usage to consume the computer.
Sigh.
It was easy enough to fix once the problem was diagnosed. And maybe now things will get back to normal.