The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Wednesday, Debtember 23, 2020

I solved the issue, but I'm not sure what the issue was

It's a bug whose solution I can't explain.

So I have a Lua module that enables event-driven programming. I also have a few modules that drive TCP and TLS connections. To make it even easier, I have a module that presents a Lua file-like interface to network connections—functions like obj:read() and obj:write().

The previous version of this interface module, org.conman.net.ios, used LPEG to handle line-based IO requests, as well as an extension to read headers from an Internet message. Given the overhead of LPEG I thought I might try using the built-in pattern matching of Lua. I reworked the code and a bench mark did show a decent and measurable improvement in speed and memory usage.

But the new code failed when transferring a sizable amount of data (about 6.7M) over TLS. It took about two days to track down the problem, and I still don't have a root cause. The code works, but I don't know why it works. And that bugs me.

To further complicate matters, the code did work when I download the data from a server I wrote (using the same Lua code as the client) but it would fail when I tried downloading the data from another server (different TLS implementation, different language, etc.).

I was able to eventually isolate the issue down one function in org.conman.net.ios. Here was the original code:

local function write(ios,...)
  if ios._eof then
    return false,"stream closed",-2
  end

  local output = ""

  for i = 1 , select('#',...) do
    local data = select(i,...)
    if type(data) ~= 'string' and type(data) ~= 'number' then
      error("string or number expected, got " .. type(data))
    end

    output = output .. data
  end

  return ios:_drain(output)
end

It works, but I didn't like that it first accumulated all the output first before writing it. So when I rewrote org.conman.net.ios, I modified the function thusly:

local function write(ios,...)
  if ios._eof then
    return false,"stream closed",-2
  end
  
  for i = 1 , select('#',...) do
    local data = select(i,...)
    if type(data) ~= 'string' and type(data) ~= 'number' then
      error(string.format("bad argument #%d to 'write' (string expected, got %s)
    end
    
    data = tostring(data)
    
    local okay,err,ev = ios:_drain(data)
    if not okay then
      syslog('error',"ios:_drain() = %s",err)
      return okay,err,ev
    end
    
    ios._wbytes = ios._wbytes + #data
  end
  
  return true
end

Instead of accumulating the data into one large buffer, it outputs it piecemeal. To further confound things, this doesn't appear to have anything to do with reading, which is what I was having issues with.

The client only did one call to this function:

local okay,err = ios:write(location,"\r\n")

The request went out, I would start receiving data, but for some odd reason, the connection would just drop about 200K short of the full file (it was never a consistent amount either).

While the reading side was a different implementation, the writing side didn't have to be different, I just felt the second vesion to be a bit better, and it shouldn't make a difference, right? [There's that word! –Editor] [What word? –Sean] [“Should.” –Editor] But regardless of my feelings about how that certainly can't be at fault, I put the previous version of write() back and lo'! It worked!

I'm flummoxed!

I don't understand why the new version of write() would cause the TLS connection to eventually fail, but it did, for whatever reason. Weird.

Obligatory Picture

An abstract representation of where you're coming from]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.