Thursday, Debtember 03, 2020
Putting the “pro” in “procrastination”
Wait!
It's Debtember!
How did that happen?
Oh … yeah … that COVID-19 long undifferentiated timey wimey blob thing going on.
Sigh.
And I never did get around to coding my 2020 NaNoGenMo entry.
Another sigh.
Well, here's to doing it next year.
Sunday, Debtember 06, 2020
An early Christmas surprise
Bunny walked into the Computer room. “Want to see the lights?” she asked as she was opening up a package.
“Is that a remote-controlled outlet?”
“Yes it is.”
“Cool. Let's see the lights.” We walked out to look at the lights splayed across the front of Chez Boca. “Nice.”
“I just need to plug this in,” said Bunny. She stepped around the bougainvillea bush and plugged the device into the outside outlet. It made a large buzzing noise, sparked, a large pop, and then it burst into flames.
“It's not supposed to do that, is it?”
“No.”
“I think you need to get your money back.”
“Most definitely.”
Saturday, Debtember 19, 2020
Details, details! It always comes down to the details
Back in July, I wrote an HTML parser using LPEG. I was a bit surprised to find the memory consumption to be higher than expected but decided to let it slide for the moment. Then in October (which I did not blog about—sigh) I decided to try using a C version of PEG. It was a rather straightforward port of the code and an almost drop-in replacement for the LPEG version (it required one line of code to change to use it). And much to my delight, not only did it use less memory (about ⅛TH of the memory) but it was also way faster (it ran in about 1/10TH the time).
It's not small though. The PEG code itself is 50K in size, the resulting C code is 764K in size (yes, that's nearly ¾ of a megabyte of source code), the resulting code is 607K in size. and with all that, it still runs with less memory than the LPEG version.
And all was fine.
Until today.
I've upgraded from Lua 5.3 (5.3.6 to be precise) to Lua 5.4 (5.4.2 to be precise). Lua 5.4 was released earlier this year, and I held off for a few months to let things settle before upgrading (and potentially updating all my code). Earlier this week I did the upgrade and proceeded to check that my code compiled and ran under the new version. All of it did, except for my new HTML parser, which caused Lua 5.4 to segfault.
With some help from the mailing list, I found the issue—I bascially ignored this bit from the Lua manual:
So, while using a buffer, you cannot assume that you know where the top of the stack is. You can use the stack between successive calls to buffer operations as long as that use is balanced; that is, when you call a buffer operation, the stack is at the same level it was immediately after the previous buffer operation. (The only exception to this rule is
luaL_addval ue
.)
Oops. The original code was:
lua_getfield(yy->L,lua_upvalueindex(UPV_ENTITY),label); entity = lua_tolstring(yy->L,-1,&len); luaL_addlstring(&yy->buf,entity,len); lua_pop(yy->L,1);
Even though it violated the manual, it worked fine through Lua 5.3. To fix it:
lua_getfield(yy->L,lua_upvalueindex(UPV_ENTITY),label); luaL_addvalue(&yy->buf);
That works.
(The code itself converts a string like “CounterClockwiseContourIntegral” and converts it to the UTF-8 character “∳” using an existing conversion table.)
What I find funny is that I participated in a very similar thread three years ago!
Anyway, the code now works, and I'm continuing on the conversion process.
LPEG vs. PEG—they both have their strengths and weaknesses
While the C PEG library is faster and uses less memory than LPEG, I still prefer using LPEG, because it's so much easier to use than the C PEG library. Yes, there's a learning curve to using LPEG, but its re module uses a similar syntax to the C PEG library, and it's easier to read and write when starting out. Another difference is that LPEG practically requires all the input to parse as a single string, whereas the C PEG can do that, it can also read data from a file (you can stream data to LPEG, but it involves more work—check out the difference between a JSON parser that takes the entire input as a string versus a JSON parser that can stream data; the later is nearly twice the size of the former).
The code isn't that much different. Here's a simple LPEG parser that will parse text like “34.12.1.444” (a silly but simple example):
local re = require "re" return re.compile( [[ tumbler <- number ( '.' number)* number <- [0-9]+ -> pnum ]], { pnum = function(c) print(">>> " .. c) end, } )
Not bad. And here's the C PEG version:
tumbler <- number ('.' number)* number <- < [0-9]+ > { printf(">>> %*s\n",yyleng,yytext); }
Again, not terrible and similar to the LPEG version.
The major difference between the two,
however,
is in their use.
In the LPEG version, tumbler
can be used in other LPEG expressions.
If I needed to parse something like “34.12.1.444:text/plain; charset=utf-8”,
I can do that:
local re = require "re" return re.compile( [[ example <- %tumbler SP* ':' SP* %mimetype SP <- ' ' / '\t' ]], { tumbler = require "tumbler", mimetype = require "org.conman.parsers.mimetype", } )
The same cannot be said for the C PEG version. It's just not written to support such use. If I need to parse text like “32.12.1.444” and mimetypes, then I have to modify the parser to support it all—there's no easy way to combine different parsers.
That said, I would still use the C PEG library, but only when memory or performance is an issue. It certainly won't be because of convenience.
Wednesday, Debtember 23, 2020
I solved the issue, but I'm not sure what the issue was
It's a bug whose solution I can't explain.
So I have a Lua module that enables event-driven programming.
I also have a few modules that drive TCP and TLS connections.
To make it even easier,
I have a module that presents a Lua file-like interface to network connections—functions like obj:read()
and obj:write()
.
The previous version of this interface module, org.conman.net.ios
,
used LPEG to handle line-based IO requests,
as well as an extension to read headers from an Internet message.
Given the overhead of LPEG I thought I might try using the built-in pattern matching of Lua.
I reworked the code and a bench mark did show a decent and measurable improvement in speed and memory usage.
But the new code failed when transferring a sizable amount of data (about 6.7M) over TLS. It took about two days to track down the problem, and I still don't have a root cause. The code works, but I don't know why it works. And that bugs me.
To further complicate matters, the code did work when I download the data from a server I wrote (using the same Lua code as the client) but it would fail when I tried downloading the data from another server (different TLS implementation, different language, etc.).
I was able to eventually isolate the issue down one function in org.conman.net.ios
.
Here was the original code:
local function write(ios,...) if ios._eof then return false,"stream closed",-2 end local output = "" for i = 1 , select('#',...) do local data = select(i,...) if type(data) ~= 'string' and type(data) ~= 'number' then error("string or number expected, got " .. type(data)) end output = output .. data end return ios:_drain(output) end
It works,
but I didn't like that it first accumulated all the output first before writing it.
So when I rewrote org.conman.net.ios
,
I modified the function thusly:
local function write(ios,...) if ios._eof then return false,"stream closed",-2 end for i = 1 , select('#',...) do local data = select(i,...) if type(data) ~= 'string' and type(data) ~= 'number' then error(string.format("bad argument #%d to 'write' (string expected, got %s) end data = tostring(data) local okay,err,ev = ios:_drain(data) if not okay then syslog('error',"ios:_drain() = %s",err) return okay,err,ev end ios._wbytes = ios._wbytes + #data end return true end
Instead of accumulating the data into one large buffer, it outputs it piecemeal. To further confound things, this doesn't appear to have anything to do with reading, which is what I was having issues with.
The client only did one call to this function:
local okay,err = ios:write(location,"\r\n")
The request went out, I would start receiving data, but for some odd reason, the connection would just drop about 200K short of the full file (it was never a consistent amount either).
While the reading side was a different implementation,
the writing side didn't have to be different,
I just felt the second vesion to be a bit better,
and it shouldn't make a difference, right?
[There's that word! –Editor]
[What word? –Sean]
[“Should.” –Editor]
But regardless of my feelings about how that certainly can't be at fault,
I put the previous version of write()
back and lo'!
It worked!
…
I'm flummoxed!
I don't understand why the new version of write()
would cause the TLS connection to eventually fail,
but it did,
for whatever reason.
Weird.
Monday, Debtember 28, 2020
Yet more adventures in profiling
While I'm now satisfied with the memory usaage, I'm starting to watching the CPU utilization and noticed with some dismay that it's quite high (even with the new HTML parser being faster overall). I started an instance a day ago (the 27TH), and it already has accumulated 27 minutes, 35 seconds of CPU time. As a contrast, the web server has only accumulated 37 seconds of CPU time since the 25TH.
That's a large difference.
The server in question is written in Lua. I have another server written in Lua, and it has only accumulated 1 minute, 26 seconds since the 25TH.
There are two differences that might account for the discrepancy:
- one gets more requests than the other;
- one uses TLS, the other doesn't.
But to be sure, I need a controlled experiment. Since both servers basically do the same thing (mainly, serve up this blog via gopher and Gemini, and convert the HTML to text formats, thus the need for an HTML parser) it was easy enough generate a list of comparable requests for both, and profile the execution.
Unfortunately, profiling across shared objects doesn't necessarily work all that well (at least on Linux). I recompiled both Lua and all the Lua modules I use (at least the ones written in C) but it only showed the time spent in the main Lua VM and nowhere else.
I then spent the time constructing a self-contained executable (containing Lua, plus all the modules comprising the application) of port70 (the gopher server) and another one for GLV-1.12556 (the Gemini server). Pretty easy to do, if a bit tedious in tracking down all the modules to include in the executables. I didn't bother with any optimizations for these runs, as I'm trying to get a feel for where the time is spent.
I profiled each executable, making the same 1,171 requests (well, “same” meaning “requesting the same content”) to each program.
First, port70, the gopher server, straight TCP connection. It accumulated 14 seconds of CPU time with the profile run, and the results:
% time | cumulative seconds | self seconds | calls | self ms/call | total ms/call | name |
---|---|---|---|---|---|---|
18.06 | 0.56 | 0.56 | 33881 | 0.00 | 0.00 | luaV_execute |
17.74 | 1.11 | 0.55 | 986744 | 0.00 | 0.00 | match |
4.03 | 1.24 | 0.13 | 28481743 | 0.00 | 0.00 | lua_gettop |
3.55 | 1.35 | 0.11 | 22087107 | 0.00 | 0.00 | index2value |
2.58 | 1.43 | 0.08 | 11321831 | 0.00 | 0.00 | yymatchChar |
2.26 | 1.50 | 0.07 | 6478653 | 0.00 | 0.00 | touserdata |
2.26 | 1.57 | 0.07 | 2063343 | 0.00 | 0.00 | pushcapture |
1.94 | 1.63 | 0.06 | 2074113 | 0.00 | 0.00 | lua_getmetatable |
1.94 | 1.69 | 0.06 | 2068487 | 0.00 | 0.00 | auxgetstr |
1.61 | 1.74 | 0.05 | 2222138 | 0.00 | 0.00 | luaS_new |
1.29 | 1.78 | 0.04 | 5469355 | 0.00 | 0.00 | luaV_equalobj |
1.29 | 1.82 | 0.04 | 5239401 | 0.00 | 0.00 | luaH_getshortstr |
1.29 | 1.86 | 0.04 | 2042852 | 0.00 | 0.00 | luaL_checkudata |
1.29 | 1.90 | 0.04 | 1207086 | 0.00 | 0.00 | lua_tolstring |
1.29 | 1.94 | 0.04 | 1070855 | 0.00 | 0.00 | luaT_gettmbyobj |
1.29 | 1.98 | 0.04 | 175585 | 0.00 | 0.00 | internshrstr |
Nothing terribly surprising there. The function luaV_execute()
is not surprising,
as that's the main driver for the Lua VM.
match()
is from LPEG,
which is used for all parsing aside from HTML.
The function yymatchChar()
is from the HTML parser I wrote,
so again,
no terrible surprise there.
Now, GLV-1.12556, the Gemini server, using TLS. This accumulated 1 minute, 24 seconds of CPU time with the profile run. The results:
% time | cumulative seconds | self seconds | calls | self ms/call | total ms/call | name |
---|---|---|---|---|---|---|
8.06 | 0.10 | 0.10 | 30070 | 0.00 | 0.01 | luaV_execute |
7.26 | 0.19 | 0.09 | 1494750 | 0.00 | 0.00 | luaH_getshortstr |
5.65 | 0.26 | 0.07 | 11943 | 0.01 | 0.01 | match |
4.03 | 0.31 | 0.05 | 535091 | 0.00 | 0.00 | luaD_precall |
4.03 | 0.36 | 0.05 | 502074 | 0.00 | 0.00 | moveresults |
3.23 | 0.40 | 0.04 | 129596 | 0.00 | 0.00 | luaS_hash |
2.42 | 0.43 | 0.03 | 11321831 | 0.00 | 0.00 | yymatchChar |
2.42 | 0.46 | 0.03 | 4218262 | 0.00 | 0.00 | yyText |
2.42 | 0.49 | 0.03 | 3293376 | 0.00 | 0.00 | yymatchString |
2.42 | 0.52 | 0.03 | 1508070 | 0.00 | 0.00 | yyrefill |
2.42 | 0.55 | 0.03 | 377362 | 0.00 | 0.00 | luaH_newkey |
1.61 | 0.57 | 0.02 | 2898350 | 0.00 | 0.00 | index2value |
1.61 | 0.59 | 0.02 | 2531258 | 0.00 | 0.00 | lua_gettop |
1.61 | 0.61 | 0.02 | 1081241 | 0.00 | 0.00 | yy_CHAR |
1.61 | 0.63 | 0.02 | 230982 | 0.00 | 0.00 | luaV_equalobj |
1.61 | 0.65 | 0.02 | 174295 | 0.00 | 0.00 | luaV_finishget |
1.61 | 0.67 | 0.02 | 136553 | 0.00 | 0.00 | luaT_gettmbyobj |
1.61 | 0.69 | 0.02 | 129534 | 0.00 | 0.00 | internshrstr |
1.61 | 0.71 | 0.02 | 10363 | 0.00 | 0.00 | traversestrongtable |
1.61 | 0.73 | 0.02 | 4684 | 0.00 | 0.00 | lua_resume |
It's satisfying seeing the same number of calls to yymatchChar()
,
but the times are smaller overall and there's less of a spike with luaV_execute()
,
leading me to believe the time is actually being spent in TLS.
That isn't showing up because I haven't compiled the TLS library to be profiled,
and it's dynamically linked in anyway.
I'm fairly confident that TLS is sucking the CPU time and it's not necessarily my code.
I'm aprehensive about attemping to recompile the TLS library with profiling in mind but it is the next logical step if I want to know for sure.
Sigh.
Tuesday, Debtember 29, 2020
The OpenSSL/LibreSSL shuffle
Two and a half years ago, someone tried using my UUID library with a modern version of OpenSSL. At the time I rejected the patch because I couldn't use it (I was, and still am, using an older version of OpenSSL). Then today, I was notified that someone else tried to do the same, and I figured it was time to actually adress the issue.
It used to be that you could do:
#include <openssl/evp.h> unsigned char hash[EVP_MAX_MD_SIZE]; EVP_MD_CTX ctx; EVP_DigestInit(&ctx,EVP_md5()); EVP_DigestUpdate(&ctx,data,len); EVP_DigestFinal(&ctx,hash,&hashsize);
The context variable declaration changed and you no longer could do that. Instead, you now have to:
#include <openssl/evp.h> unsigned char hash[EVP_MAX_MD_SIZE]; EVP_MD_CTX *ctx; ctx = EVP_MD_CTX_new(); if (ctx != NULL) { EVP_DigestInit(ctx,EVP_md5()); EVP_DigestUpdate(ctx,data,len); EVP_DigestFinal(ctx,hash,&hashsize); EVP_MD_CTX_free(ctx); }
It's an annoying change and yet, I can understand why the change was made—future updates of hash functions could use more space than what you statically allocate which could lead to a buffer overrun. It also changed what used to be an error-free path (well, buffer overruns aside) to a path that could fail. The reason I put off making the change was trying to find the version of OpenSSL where the change was made. After downloading over a dozen versions of OpenSSL and checking each one, I found the change in version 1.1.0.
This also prompted me to spend the time to update my TLS Lua module to the latest version. This also involved downloading over a dozen versionf of LibreSSL and checking each one. There was only one minor change involved, and that was adding a new call to the module.
I have yet to profile LibreSSL though.
Wednesday, Debtember 30, 2020
A sane and easy to use TLS library for OpenSSL! Will wonders never cease!
I saw the following on the Gemini mailing list:
Perhaps take a look at "[gentoo-dev] [RFC] Discontinuing LibreSSL support?". Fascinating dynamic.
I only use LibreSSL because it comes with libtls
,
an easier to use API than the base LibreSSL (which itself was forked years ago from OpenSSL for various reasons).
It seems that over the years,
the API between LibreSSL and OpenSSL have drifted and now the Linux distribution of Genntoo is thinking of dropping support for LibreSSL.
It doesn't affect me,
since I'm not using Gentoo
(and the last time I used Gentoo was a rather stressful time).
I just install it from source
(and it's a pain to use too, because I don't want to destroy my existing OpenSSL installation).
I was,
however,
happy to see a port of libtls
to OpenSSL,
as that would make it easier to keep using libtls
.
Thursday, Debtember 31, 2020
Yet another wrinkle in the TLS library woes
I downloaded and compiled libretls
,
a libtls
for OpenSSL.
I also recompiled my Lua wrapper for libtls
and reran the profiling tests from the other day.
I can now say for certain that there is no need to profile LibreSSL because the results speak for themselves—an accumulation of 24 CPU seconds using OpenSSL vs. 1 minutes, 24 seconds using LibreSSL. Looking at the results, it makes some sense. The LibreSSL library I'm using the the “portable version” for non-OpenBSD systems, and there's probably little work to making it fast.
So now I have to rethink my using LIBRESSL_VERSION_NUMBER
when compiling the Lua module.
I will no longer be using the LibreSSL version and I can't exactly rely upon the TLS_API
value either … sigh.