Friday, January 16, 2015
Never trust input, and you can still leak stuff with garbage collection
We're still IOT testing (yes, five months, but we're doing IOT testing with The Monopolistic “We Don't Have To Care, We're The” Phone Company where things move slower than molasses in Antarctica) and I'm finding it interesting (in the Chinese sense of the term) to see “actual real world data” rather than the “pristine theoretically correct data” we've been using for testing (because in theory, there is no difference between theory and practice, but in practice …).
For instance, I've been reeiving reports that “Project: Sippy Cup” (which implements a subsection
of SIP) was only processing less than
5% of the OPTIONS
requests (which informs the other side if
we're ready for processing, but also informs the other side what other type
of requests we support). Odd, as our internal testing was handling every
OPTIONS
request we sent.
So I added some addtional logging:
- Log everytime we receive a packet.
- Log everytime we get an
OPTIONS
request. - Log the results of sending a reply to an
OPTIONS
request.
and I get back:
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9901: OPTIONS from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9902: send OPTIONS response to ip6:[fc00:0:60:8a1::9]:5060 datasize=581 sentbytes=581 err=Success Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9901: OPTIONS from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9902: send OPTIONS response to ip6:[fc00:0:60:8a1::8]:5060 datasize=581 sentbytes=581 err=Success Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9901: OPTIONS from ip6:[fc00:0:60:8a1::9]:5060 Sippy-Cup9902: send OPTIONS response to ip6:[fc00:0:60:8a1::9]:5060 datasize=581 sentbytes=581 err=Success Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9901: OPTIONS from ip6:[fc00:0:60:8a1::8]:5060 Sippy-Cup9902: send OPTIONS response to ip6:[fc00:0:60:8a1::8]:5060 datasize=581 sentbytes=581 err=Success
At first, I thought perhaps the output sent was filtered to just those
three messages and some other messages, like, “Sippy-Cup0014:
ip6:[fc00:0:60:8a1::9]:5060 sent garbage
” or “Sippy-Cup0055:
ip6:[fc00:0:60:8a1::9]:5060 sent an unknown call
” were filtered
out.
Nope. What I saw is what got logged.
Fortunately, the actual network packets had been captured so I could look at them. And I immediately saw the problem.
“Project: Sippy-Cup” is written in Lua, since it's easier to parse text in Lua
than in C (it's easy to parse text with LPeg), and each request
is handled in a Lua
coroutine—it makes following the code a lot easier since it can be
written in a straight-forward procedural way rather than with callbacks or
complex state machines and what not. The only trick here is that for any
operation that might block, the code has to manually insert a call to
coroutine.yield()
to let another coroutine run (if there are
any to run).
In SIP, all the
messages pertaining to a phone call each have a unique identifier, and it's
this identifier I use to know which coroutine to run when a message comes in
(or spwan a new coroutine if this is a new call). The OPTIONS
request isn't really part of a normal call, but I had assumed, like other
SIP requests, that
they too, would have a unique identifier.
It turns out that no, The Monopolistic “We Don't Have To Care, We're
The” Phone Company is gladly using the same “unique” identifier for each
OPTIONS
request, mostly … occasionally they'll generate a
new “unique” identifier just to keep us on our toes.
Or something like that.
Okay, that did reveal the actual bug though—I wasn't releasing
the reference to the coroutine when processing an OPTIONS
request. Had I done that, it wouldn't have mattered, since I wouldn't have
a reference to a coroutine with that particular “unique” identifier, and a
new coroutine would have been created to handle the OPTIONS
request.
Which goes to show you that not only is input weirder than expected, but even in a language with garbage collection, you can still leak resources.