The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Monday, July 23, 2018

Managing TLS connections using Lua and Lua coroutines

Getting libtls working with Lua wasn't as straightforward and I thought it would be. It works (for the most part) but I had to change my entire approach. The code is an ugly mess and there's quite a bit of duplication in several spots.

But! I can request web pages, in Lua, via HTTPS in an event loop based around select() (or poll() or epoll() or whatever is the low level event notification scheme used). Woot! And I'm going into excruciating detail on this.

Back on Friday, when I wrote some “proof-of-concept” code, I had thought I could switch coroutines in the user-supplied I/O callback routines (and if coroutines existed in C, that is where you would yield to another coroutine). It was easy enough to extend the callback to a Lua routine— in the routine that wraps the libtls function tls_connect_cbs():

static int Ltls_connect_cbs(lua_State *L)
{
  struct tls **tls = lua_touserdata(L,1);
  int rc           = tls_connect_cbs(
			*tls,
			Xtls_read,
			Xtls_write,
			L,
			luaL_checkstring(L,2)
		     );
  
  if (rc != 0)
  {
    lua_pushboolean(L,false);
    return 1;
  }
  
  lua_settop(L,5);
  lua_pushlightuserdata(L,*tls);
  lua_getuservalue(L,1);
  lua_pushvalue(L,1);
  lua_setfield(L,-2,"_ctx");
  lua_pushvalue(L,2);
  lua_setfield(L,-2,"_servername");
  lua_pushvalue(L,3);
  lua_setfield(L,-2,"_userdata");
  lua_pushvalue(L,4);
  lua_setfield(L,-2,"_readf");
  lua_pushvalue(L,5);
  lua_setfield(L,-2,"_writef");
  
  lua_settable(L,LUA_REGISTRYINDEX);
  lua_pushboolean(L,true);
  return 1;
}

I pass in the two callback functions, and I'm using the Lua state context as the userdata in the callbacks. I then create a Lua table, populate it with some useful information, such as the Lua functions to call, and associate it in the Lua registry with the value of the libtls context. Then, when libtls calls one of the callbacks:

static ssize_t Xtls_write(struct tls *tls,void const *buf,size_t buflen,void *cb_arg)
{
  lua_State *L = cb_arg;
  ssize_t    len;
  
  lua_pushlightuserdata(L,tls);
  lua_gettable(L,LUA_REGISTRYINDEX);
  lua_getfield(L,-1,"_writef");
  lua_getfield(L,-2,"_ctx");
  lua_pushlstring(L,buf,buflen);
  lua_getfield(L,-4,"_userdata");
  lua_call(L,3,1);
  
  len = lua_tonumber(L,-1);
  lua_pop(L,2);
  return len;
}

I get the Lua state via the user argument. From that, and the libtls context, I obtain the data I cached into the Lua table, which give me the Lua function to call. Said function can then call coroutine.yield().

Straightforward, easy, and wrong! I got the dreaded “attempt to yield across metamethod/C-call boundary” error. Darn.

The attempted flow looks like (yellow boxes are Lua functions; green boxes are C functions):

{data=tls.read()} → [Ltls_read(lua)] → [tls_read(ctx)] → [Xtls_read(ctx,lua)] → [lua_call(lua)] → {my_callback()} → {coroutine.yield()} {}=Lua function []=C function

There are four layers of C functions that can't be yielded through. Lua does have a way of dealing with intervening C functions, but it's somewhat clunky.

{luaf_a()} → [cf_orig(lua)] → [lua_callk(lua,cf_c)] → {luaf_b()} → {coroutine.yield} / {coroutine.resume} → {luaf_b()*} → [cf_c(lua)*] → {luaf_a()}

In this case, the Lua function lua_callk() is handled specially so it doesn't cause an error. The function cf() needs to be split in half—the portion prior to calling into Lua, and the second half to handle things after a potential call to coroutine.yield(). That's represented above by the functions cf_orig() and cf_c(). The “*” represent the functions returning, not calling. coroutine.resume() will restart luaf_b() right after it's call to coroutine.yield(). And when luaf_b() returns, it “returns” to cf_c(), which does whatever and finally returns, which “returns” to luaf_a().

But in the case I'm dealing with just doesn't work with that model. The code calling into Lua doesn't have the signature:

int function(lua_State *lua_State);

but the signature:

ssize_t function(struct tls *ctx,void *buf,size_t buflen,void *cb_arg);

Not only are the return types different, but they have completely different semantics—for libtls, it's the number of bytes transferred, whereas for Lua, it's the number of items being returned to Lua.

No, I had to rethink the entire approach, and do the call to coroutine.yield() a bit higher in the call stack. Which also meant I had to push dealing with TLS_WANT_POLLIN and TLS_WANT_POLLOUT back to the caller. The documentation states:

In the case of blocking file descriptors, the same function call should be repeated immediately. In the case of non-blocking file descriptors, the same function call should be repeated when the required condition has been met.

And here I was, trying to hide such concerns from the user. Ah well.

I eventually got it working, but man, is it ugly. The Lua code wants to read data, so I have to call into libtls. That in turn, calls back into my code, and if I don't have any, I need to return TLS_WANT_POLLIN, which bubbles up through libtls back to my code, which can then yield.

Meanwhile, from the other end, I get data from the network. I can't just feed it into libtls, I have to feed it when libtls calls the callback for the data. But when I get the data, I may need to resume the coroutine, so I have to track that information as well.

I can almost understand the code (and yes, I wrote it; did I mention it's ugly?)

But I'm happy. The following code works in my existing network framework (boy does that sound wierd):

local function request(item)
  syslog('debug',"requesting %s",item.url)
  local u = url:match(item.url)

  -- -------------------------------------------------------
  -- asynchronous DNS lookup---blocks the current coroutine
  -- until a result is returned via the network.
  -- -------------------------------------------------------

  local addr = dns.address(u.host,'ip','tcp',u.port)
  
  if not addr then
    syslog('error',"finished %s---could not look up address",u.host)
    return
  end

  -- ---------------------------------------------------------
  -- This has nothing to do with the iPhone operating system,
  -- but everything to do with "Input/Output Stream"
  -- ---------------------------------------------------------

  local ios
  
  if u.scheme == 'http' then
    ios = tcp.connecta(addr[1]) -- connect via TCP
  else
    ios = tls.connecta(addr[1],u.host) -- connect via TLS
  end
  
  if not ios then
    syslog('error',"could not connect to %s",u.host)
    return
  end
  
  local path    = table.concat(u.path,'/')
  local fhname  = "header/" .. item.hdr
  local fbname  = "body/"   .. item.body
  local fh      = io.open(fhname,"w")
  local fb      = io.open(fbname,"w")
  
  local command = string.format([[
GET /%s HTTP/1.0
Host: %s
Connection: close
User-Agent: TLSTest/2.0 (Lua TLS Testing Program)
Accept: */*

]],path,u.host)

  ios:write(command)
  
  fh:write(ios:read("*h"))
  
  repeat
    local data = ios:read("*a")
    fb:write(data)
  until data == ""
  
  fb:close()
  fh:close()
  ios:close()

  syslog('debug',"finished %s %s",item.url,tostring(addr[1]))
end

Any number of requests can be started and they all run concurrently, which is just what I wanted.

Now, the code I have for the Lua wrapper for libtls covers just what I need to do this. More work is required to finish covering the rest of the API. I also have to clean up the Lua code that backs the above sample code so that I might have a chance of understanding it at some point in the future.

And until I get the working code published, you can look at the “proof-of-concept” Lua coroutine code I worked from (and no, the above code sample will not work as is with this “proof-of-concept” code).

Obligatory Picture

[It's a study in contrasts—digital camera contrasts]

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: http://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

http://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2018 by Sean Conner. All Rights Reserved.