The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Thursday, April 30, 2015

Some perils of handing signals in Lua

This is one of those “Oh, yeah, I didn't think that through, did I?” type of bug.

I wrote a signal module for Lua, which can handle both ANSI C and POSIX signals with largly the same API (the POSIX implementation one has some additional functions defined).

Handling signals in Lua is not that straightforward because of the nature of signals—you are effectively writting multithreaded code. You just can't call back into Lua from the signal handler (while the Lua VM has no static data and each Lua state is isolated unto itself, two threads sharing a Lua state can lead to problemss). The only Lua function you can safely call is lua_sethook(), which can be used to stop the Lua VM at the next VM instruction (it's typically used for debugging and signal handing). This callback can then call back into Lua. It is a bit convoluted (the signal handler will call lua_sethook() and return; the Lua VM will resume and then call the hook), but it does allow you to write signal handlers in Lua:

signal.catch('windowchange',function()
  print("Wheeee!  Our terminal just resized!")
end)

and not have it blow up on you.

So, with that in mind, I give you this code:

local net    = require "org.conman.net"
local clock  = require "org.conman.clock"
local signal = require "org.conman.signal"

local raddr = net.address("127.0.0.1",udp,'echo')
local sock  = net.socket(raddr.family,'udp')

signal.catch('alarm',function()
  sock:send(raddr,tostring(clock.get()))
end)

clock.itimer(1)

local previous = clock.get()

while true do
  local _,data = sock:recv()
  local now = clock.get()
  if data then
    local zen = tonumber(data)
    print(string.format("%.7f\t%.7f",now - zen,now - previous))
    previous = now
  end
end

This is a UDP echo client program. signal.catch() handles the alarm signal (SIGALRM) by sending a packet of data (which is just the current time) to the echo server. clock.itimer() informs the kernel to send the alarm signal once a second. So once a second, our program receives the alarm signal and sends the current time. Then, in an infinite loop, we just wait for packets to arrive (which should be the packets we sent to the echo server—they're “echoed” back to us) and we calculate how long the packet took round trip and how long it was from the previous packet. The output looks like:

0.0002971	1.0014961
0.0003922	0.9999950
0.0002851	0.9998930
0.0003171	1.0000319
0.0003910	0.9999740
0.0002551	0.9998641
0.0003359	1.0000808

The first column is the round trip time (in seconds) for the packet (around 3 to 4 ten thousandths of a second), and the second column is how long (in seconds) from the previous packet (a second, give or take a few ten thousandths).

But our call to sock:recv() is interrupted by the alarm signal. Unfortunately, one side effect of signals is that they will interrupt “long running” system calls, which is almost always system calls dealing with I/O, such as read() or write(). When such a call is interrupted, the system call will return an error of EINTR. We can see this if we change the code a bit:

local net    = require "org.conman.net"
local clock  = require "org.conman.clock"
local signal = require "org.conman.signal"
local errno  = require "org.conman.errno"

local raddr = net.address("192.168.90.118",'udp',22222)
local sock  = net.socket(raddr.family,'udp')

signal.default('int')

signal.catch('alarm',function()
  sock:send(raddr,tostring(clock.get()))
end)

clock.itimer(1)

local previous = clock.get()

while true do
  local _,data,err = sock:recv()
  local now = clock.get()
  if data then
    local zen = tonumber(data)
    print(string.format("%.7f\t%.7f",now - zen,now - previous))
    previous = now
  else
    print(">>>",errno[err])
  end
end

and when we run it:

>>>	Interrupted system call
0.0003049	1.0015509
>>>	Interrupted system call
0.0002320	0.9998269
>>>	Interrupted system call
0.0002131	0.9999812
>>>	Interrupted system call
0.0001860	0.9999728
>>>	Interrupted system call
0.0002639	0.9999781

With POSIX, you can specify that for a given signal, system calls are to be automatically restarted so you can dispense with EINTR error handling.

And here's were we finally get to the “Oh, yeah, I didn't think that through, did I?” type of bug.

Not wanting the code to be interrupted by the alarm signal, I changed the call to signal.catch() so it would restart any system calls:

signal.catch('alarm',function()
  sock:send(raddr,tostring(clock.get()))
end,'restart')

When I ran the code, I got nothing! There was simply no ouput happening. It caught me by surprise and it took me several minutes to figure out what was happening (or rather, what wasn't happening):

  1. We enter sock:recv() (a Lua function), which ultimately does a system call to recvfrom();
  2. the alarm signal (SIGALRM) is raised;
  3. the kernel iterrupts our current system call (recvfrom()) and calls the signal handler we installed;
  4. the signal handler calls lua_sethook() so that the Lua VM (when it resumes) will be stopped and our Lua hook function (written in Lua) will execute;
  5. the kernel then resumes the system call (recvfrom()) that was previously interrupted.

And thus we get to the punchline: the Lua VM doesn't resume because we're still in a system call! And thus, the signal handler written in Lua is never called, which doesn't send a packet, because we're stuck in our system call (recvfrom()) waiting for some data that will never arrive.

D'oh!

If the above code were written in C, there would be no issue; clock_gettime() and sendto() (the system calls underlying the Lua functions clock.get() and sock:send() respectively) are safe to call from a signal handler. I may not have been able to safely convert the time to text (since snprintf()—the only standard C function able to convert numbers to text, isn't documented as being safe to call in a signal handler) but sending the raw binary values would be okay in that case.

But this isn't C, it's Lua. And what we have here is a type of leaky abstraction. That 20/20 hindsight is such a bastard.

Obligatory Picture

An abstract representation of where you're coming from]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer

No AI was used in the making of this site, unless otherwise noted.

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.