Monday, July 23, 2018
Managing TLS connections using Lua and Lua coroutines
Getting libtls
working with Lua wasn't as straightforward
and I thought it would be. It works
(for the most part)
but I had to change my entire approach.
The code is an ugly mess and there's quite a bit of duplication in several spots.
But!
I can request web pages,
in Lua,
via HTTPS in an event loop based around select()
(or poll()
or epoll()
or whatever is the low level event notification scheme used).
Woot!
And I'm going into excruciating detail on this.
Back on Friday,
when I wrote some “proof-of-concept” code,
I had thought I could switch coroutines in the user-supplied I/O
callback routines
(and if coroutines existed in C,
that is where you would yield to another coroutine).
It was easy enough to extend the callback to a Lua routine—
in the routine that wraps the libtls
function tls_connect_cbs()
:
static int Ltls_connect_cbs(lua_State *L) { struct tls **tls = lua_touserdata(L,1); int rc = tls_connect_cbs( *tls, Xtls_read, Xtls_write, L, luaL_checkstring(L,2) ); if (rc != 0) { lua_pushboolean(L,false); return 1; } lua_settop(L,5); lua_pushlightuserdata(L,*tls); lua_getuservalue(L,1); lua_pushvalue(L,1); lua_setfield(L,-2,"_ctx"); lua_pushvalue(L,2); lua_setfield(L,-2,"_servername"); lua_pushvalue(L,3); lua_setfield(L,-2,"_userdata"); lua_pushvalue(L,4); lua_setfield(L,-2,"_readf"); lua_pushvalue(L,5); lua_setfield(L,-2,"_writef"); lua_settable(L,LUA_REGISTRYINDEX); lua_pushboolean(L,true); return 1; }
I pass in the two callback functions,
and I'm using the Lua state context as the userdata in the callbacks.
I then create a Lua table,
populate it with some useful information,
such as the Lua functions to call,
and associate it in the Lua registry with the value of the libtls
context.
Then,
when libtls
calls one of the callbacks:
static ssize_t Xtls_write(struct tls *tls,void const *buf,size_t buflen,void *cb_arg) { lua_State *L = cb_arg; ssize_t len; lua_pushlightuserdata(L,tls); lua_gettable(L,LUA_REGISTRYINDEX); lua_getfield(L,-1,"_writef"); lua_getfield(L,-2,"_ctx"); lua_pushlstring(L,buf,buflen); lua_getfield(L,-4,"_userdata"); lua_call(L,3,1); len = lua_tonumber(L,-1); lua_pop(L,2); return len; }
I get the Lua state via the user argument.
From that,
and the libtls
context,
I obtain the data I cached into the Lua table,
which give me the Lua function to call.
Said function can then call coroutine.yield()
.
Straightforward, easy, and wrong! I got the dreaded “attempt to yield across metamethod/C-call boundary” error. Darn.
The attempted flow looks like (yellow boxes are Lua functions; green boxes are C functions):
There are four layers of C functions that can't be yielded through. Lua does have a way of dealing with intervening C functions, but it's somewhat clunky.
In this case,
the Lua function lua_callk()
is handled specially so it doesn't cause an error.
The function cf()
needs to be split in half—the portion prior to calling into Lua,
and the second half to handle things after a potential call to coroutine.yield()
.
That's represented above by the functions cf_orig()
and cf_c()
.
The “*” represent the functions returning,
not calling. coroutine.resume()
will restart luaf_b()
right after it's call to coroutine.yield()
.
And when luaf_b()
returns,
it “returns” to cf_c()
,
which does whatever and finally returns,
which “returns” to luaf_a()
.
But in the case I'm dealing with just doesn't work with that model. The code calling into Lua doesn't have the signature:
int function(lua_State *lua_State);
but the signature:
ssize_t function(struct tls *ctx,void *buf,size_t buflen,void *cb_arg);
Not only are the return types different,
but they have completely different semantics—for libtls
,
it's the number of bytes transferred,
whereas for Lua,
it's the number of items being returned to Lua.
No,
I had to rethink the entire approach,
and do the call to coroutine.yield()
a bit higher in the call stack.
Which also meant I had to push dealing with TLS_WANT_POLLIN
and TLS_WANT_POLLOUT
back to the caller.
The documentation states:
TLS_WANT_POLLIN
The underlying read file descriptor needs to be readable in order to continue.TLS_WANT_POLLOUT
The underlying write file descriptor needs to be writeable in order to continue.In the case of blocking file descriptors, the same function call should be repeated immediately. In the case of non-blocking file descriptors, the same function call should be repeated when the required condition has been met.
And here I was, trying to hide such concerns from the user. Ah well.
I eventually got it working,
but man,
is it ugly.
The Lua code wants to read data,
so I have to call into libtls
.
That in turn,
calls back into my code,
and if I don't have any,
I need to return TLS_WANT_POLLIN
,
which bubbles up through libtls
back to my code,
which can then yield.
Meanwhile,
from the other end,
I get data from the network.
I can't just feed it into libtls
,
I have to feed it when libtls
calls the callback for the data.
But when I get the data,
I may need to resume the coroutine,
so I have to track that information as well.
I can almost understand the code (and yes, I wrote it; did I mention it's ugly?)
But I'm happy. The following code works in my existing network framework (boy does that sound wierd):
local function request(item) syslog('debug',"requesting %s",item.url) local u = url:match(item.url) -- ------------------------------------------------------- -- asynchronous DNS lookup---blocks the current coroutine -- until a result is returned via the network. -- ------------------------------------------------------- local addr = dns.address(u.host,'ip','tcp',u.port) if not addr then syslog('error',"finished %s---could not look up address",u.host) return end -- --------------------------------------------------------- -- This has nothing to do with the iPhone operating system, -- but everything to do with "Input/Output Stream" -- --------------------------------------------------------- local ios if u.scheme == 'http' then ios = tcp.connecta(addr[1]) -- connect via TCP else ios = tls.connecta(addr[1],u.host) -- connect via TLS end if not ios then syslog('error',"could not connect to %s",u.host) return end local path = table.concat(u.path,'/') local fhname = "header/" .. item.hdr local fbname = "body/" .. item.body local fh = io.open(fhname,"w") local fb = io.open(fbname,"w") local command = string.format([[ GET /%s HTTP/1.0 Host: %s Connection: close User-Agent: TLSTest/2.0 (Lua TLS Testing Program) Accept: */* ]],path,u.host) ios:write(command) fh:write(ios:read("*h")) repeat local data = ios:read("*a") fb:write(data) until data == "" fb:close() fh:close() ios:close() syslog('debug',"finished %s %s",item.url,tostring(addr[1])) end
Any number of requests can be started and they all run concurrently, which is just what I wanted.
Now, the code I have for the Lua wrapper for libtls
covers just what I need to do this.
More work is required to finish covering the rest of the API.
I also have to clean up the Lua code that backs the above sample code so that I might have a chance of understanding it at some point in the future.
And until I get the working code published, you can look at the “proof-of-concept” Lua coroutine code I worked from (and no, the above code sample will not work as is with this “proof-of-concept” code).