The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Friday, March 01, 2013

Parsing—it's not just for compilers anymore

I've been playing around with LuaRocks and while I've made a rock of all my modules, I've been thinking that it would be better if I made the modules individual rocks. That way, you can install just the modules you want (perhaps you want to embed a C compiler in your Lua program) instead of a bunch of modules most of which you won't use.

And that's fine. But I like the ability to pull the source code right out of the repository when making a rock. Now, given that the majority of my modules are single files (either in Lua or C) and the fact that it's difficult to checkout a single file with git (or with svn for that matter) I think I'd be better served having each module be its own repository.

And that's fine, but now I have a larger problem—how do I break out the individual files into their own repositories and keep the existing revision history? This doesn't seem to be an easy problem to solve.

Sure, git now has the concept of “submodules”—external repositories referenced in an existing repository, but that doesn't help me here (and git's handling of “submodules” is quirky at best). There's git-filter-branch but that's if I want to break a directory into its own repository, not a single file. But there's also git-fast-export, which dumps an existing repository in a text format, supposedly to help export repositories into other version control systems.

I think I can work with this.

The resulting output is simple and easy to parse, so my thought is to only look at bits involving the file I'm interested in, and generating a new file that can then be imported into a fresh resposity with git-fast-import.

I used LPeg to parse the exported output (why not? The git export format is documented with BNF, which is directly translatable into Lpeg), and the only difficult portion was handling this bit of syntax:

'data' SP <count> LF
<raw> LF?

A datablock contains the number of bytes to read starting with the next line. Defining this in LPeg took some thinking. An early approach was something like:

data = Ct(				-- return parse results in table
	   P'data '			-- match 'data' SP
	   * Cg(R"09"^1,'size')		-- get size, save for later reference
	   * P'\n'			-- match LF
	   * Cg(			-- named capture
	         P(tonumber(Cb('size'))) -- of 'size' bytes characters
		 ,'data'                -- store as 'data'
	     )
	   * P'\n'^-1			-- parse optional LF
	)

lpeg.P(n) states that it matchs n characters, but in my case, n wasn't constant. You can do named captures, so I figured I could capture the size, then retrieve it by name, passing the value to lpeg.P(), but no, that didn't work. It generates “bad argument #1 to 'P' (lpeg-pattern expected, got nil)”—in other words, an error.

It took quite a bit of playing around, and close reading of the LPeg manual before I found the solution:

function immdata(subject,position,capture)
  local size  = tonumber(capture)
  local range = position + size - 1
  local data  = subject:sub(position,range)
  return range,data
end

data = Ct(
	   P'data '
	   * Cg(Cmt(R"09"^1 * P"\n",immdata),'data')
	   * P'\n^-1
	)

It's the lpeg.Cmt() that does it. It calls the given function as soon as the given pattern is matched. The function is given the entire object being parsed (one huge string, in this case the subject parameter), the position after the match (the position parameter), and the actual string that was matched (the capture parameter). From there, we can parse the size (tonumber(), a standard Lua functionm, ignores the included line feed character), then we return what we want as the capture (the variable amount of data) and the new position where LPeg should resume parsing.

And this was the hardest part of the entire project, trying to match a variable number of unknown characters. Once I had this, I could read the exported respository into memory, find the parts relating to an individual file and generate output that had the history of that one file (excluding the bits where the file may have moved from directory to directory—those wheren't needed) which could then be imported into a clean git repository.

Saturday, March 02, 2013

Stupid GitHub tricks

All that work parsing git repositories was for naught—it seems you can link to individual files on GitHub (go figure!). So now I can create a rockspec per module, like:

package = "org.conman.tcc"
version = "1.0.0-0"

source =
{
  url = "https://raw.github.com/spc476/lua-conmanorg/1.0.0/src/tcc.c"
}

description =
{
  homepage = "http://...",
  maintainer = "Sean Conner <sean@conman.org>",
  license    = "LGPL",
  summary    = "Lua wrapper for TCC",
  detailed   = [[
	Blah blah blah
  ]]
}

dependencies =
{
  "lua ~> 5.1"
}

external_dependencies =
{
  TCC = { header = "libtcc.h" }
}

build =
{
  type = "builtin",
  copy_directories = {},
  modules =
  {
    ['org.conman.tcc'] = 
    {
      sources   = { 'tcc.c' },
      libraries = { "tcc" },
    }
  },

  variables =
  {
    CC = "$(CC) -std=c99",
    CFLAGS = "$(CFLAGS)",
    LFLAGS = "$(LIBFLAG)",
    LUALIB = "$(LIBDIR)"
  }
}

(this particular module embeds a C compiler in Lua, which is something I do need to talk about).

But it isn't like I wasted time on this. No, I don't view it that way at all. In fact, I learned a few things—how to parse git repositories, how to parse a variable amount of data in LPeg and I have code to extract a single file into its own git repository if I ever have that need again.

Sunday, March 03, 2013

I mean, Washington may be a wretched hive of scum and villiany, but …

From
"Agent Chris Swecker"<laura@nwclbj.com>
To
undisclosed-recipients:;
Subject
*****SPAM***** UNITED STATES DEPARTMENT OF JUSTICE
Date
Sun, 3 Mar 2013 03:38:06 +0800

Spam detection software, running on the system “DiskStation”, has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see admin for details.

Content preview: Federal Bureau of Investigation (FBI) Counter-terrorism Division and Cyber Crime Division J. Edgar. Hoover Building Washington DC Dear Beneficiary, Series of meetings have been held over the past 7 months with the secretary general of the United Nations Organization. This ended 3 days ago. It is obvious that you have not received your fund which is to the tune of $8.5,000.000.00 due to past corrupt Governmental Officials who almost held the fund to themselves for their selfish reason and some individuals who have taken advantage of your fund all in an attempt to swindle your fund which has led to so many losses from your end and unnecessary delay in the receipt of your fund. […]

Content analysis details: (6.9 points, 5.0 required)
pts rule name description
-1.4ALL_TRUSTED Passed through trusted hosts only via SMTP
1.8SUBJ_ALL_CAPS Subject is all capitals
0.0MONEY_BACK BODY: Money back guarantee
0.0HTML_MESSAGE BODY: HTML included in message
1.7MIME_HTML_ONLY BODY: Message only has text/html MIME parts
0.0FORGED_OUTLOOK_TAGSOutlook can't send HTML in this format
0.7MSOE_MID_WRONG_CASEMSOE_MID_WRONG_CASE
0.0FORGED_OUTLOOK_HTMLOutlook can't send HTML message only
4.2FORGED_MUA_OUTLOOK Forged mail pretending to be from MS Outlook

The original message was not completely plain text, and may be unsafe to open with some email clients; in particular, it may contain a virus, or confirm that your address can receive spam. If you wish to view it, it may be safer to save it to a file and open it with an editor.

You know … if you're going to try a Nigerian 419 scam, it might be wise to ensure your email isn't flagged as spam on the way out of your own email server! I'm just saying …

Monday, March 04, 2013

Ideas in parsing the command line

For any non-trivial script, even for personal consumption, it's necessary to supply usage text. The novelty of Lapp is that it starts from that point and defines a loose format for usage strings which can specify the names and types of the parameters.

An example will make this clearer:

	-- scale.lua
	  require 'lapp'
	  local args = lapp [[
	  Does some calculations
	    -o,--offset (default 0.0)  Offset to add to scaled number
	    -s,--scale  (number)  Scaling factor
	     <number> (number )  Number to be scaled
	  ]]

	  print(args.offset + args.scale * args.number)

lua-users wiki: Lapp Framework

The thought of parsing the usage text for parsing the command line never occured to me, and I think it's brilliant.

Now, when I want to modify the command line of a program I wrote (and this is mostly in C, by the way), there are four locations I have to edit:

  1. An enumeration specifying the “short” form of the command line option
  2. A structure describing both the short and the long forms of a command line option
  3. A switch statement that processes the command line options from getopt_long()
  4. The text printed out describing the command line options

This method though, there's only one area I would have to edit.

Now granted, this is only for Lua, but I can't see why something similar for other languages can't be done.


An unholy mashup

I know some of my college friends will find this herectical, but what happens when you take Carly Rae Jepsen's “Call Me Maybe” and mix it with Nine Inch Nails' “Head Like A Hole”?

Oddly enough, you get something that works (link via GoogleFacePlusBook), kind of, maybe, in that “train-wreck cross-genre musical mashup” kind of way.

Tuesday, March 05, 2013

Papers please

I've seen Constitutional Free Zone map multiple times, and I've tried to substantiate the claims made by the ACLU but everything I've found so far points back to the ACLU; I've yet to find anything that doesn't point back there.

But then I saw this video of DHS checkpoint refusals (link via Flutterby), all of which took place inside the United States (and not at the border), one of which was at least 30 miles from the border. So perhaps there is something to that ACLU article.

Another odd thing about that video—most of the agents would not, refused, or tried to talk about the question “am I being detained?” Oh, and the automatic assumption of guilt when one refuser plead the Fifth. I also found it offensive when the officers admonished the refusers for making their job more difficult.

Wednesday, March 06, 2013

Peak government, I tell ya! Peak government!

Take Eddie Leroy Anderson, a retired logger from Idaho whose only crime was loaning his son “some tools to dig for arrowheads near a favorite campground of theirs,” according to the Wall Street Journal. Anderson and his son found no arrowheads, but because they were unknowingly on federal land at the time they were judged to be in violation of an obscure Carter- era law called the Archaeological Resources Protection Act.

The government showed no mercy. Wendy Olson, the Obama appointee prosecuting the case, saw to it that father and son were fined $1,500 apiece and each sentenced to a year's probation. “Folks do need to pay attention to where they are,” she said.

Statutory law in America has expanded to the point that government's primary activity is no longer to protect, preserve and defend our lives, liberty and property, but rather to stalk and entrap normal American citizens doing everyday things.

After identifying three federal offenses in the U.S. Constitution— treason, piracy and counterfeiting—the federal government left most matters of law enforcement to the states. By the time President Obama took office in 2009, however, there were more than 4,500 federal criminal statutes on the books.

Via Instapundit, Op-Ed: How to end overcriminalization | WashingtonExaminer.com

Remember, ignorantia juris non excusat, so better start reading.

Thursday, March 07, 2013

Plugging leaky memory

Just because a language has garbage collection doesn't mean you still can't leak memory—you can easily leak memory, since quite a few modern langauges that have garbage collection have ways of calling into libraries written in C, and those can leak.

With that said, reading “Tracking down a memory leak in Ruby's EventMachine (link via Hacker News) was quite informative. Looking for patterns in the leaked memory as a means of tracking down what was being leaked was brilliant (“Well, as mentioned, 95+% of our program’s memory footprint is leaked objects. So if we just take a random sample of bits of memory, we will find leaked objects with very good probability.”). And I did not know you could call C functions from within gdb.

This is something I'll have to keep in mind for work.

Friday, March 08, 2013

Um … why couldn't the giant eagles fly the Ring into Mordor?

When I first saw How the Lord of the Rings Should Have Ended, I though, “Yeah, why didn't they use the giant eagles to fly the Ring to Mount Doom?”

One of the arguments I've heard is that the Nazgûls' fell beasts would have taken out the giant Eagles once they entered into Mordor. But Sean Chist felt otherwise—he thinks it's a plot hole left by Tolkien (link via Hacker News).

But personally, I find this theory much more plausible and a much better explanation for why “Operation: AirDrop One” wasn't done.

Saturday, March 09, 2013

Shave and a haircut, definitely not two bits

Bunny decided it was time for me to get a haircut. Normally, she does the cutting, but after the last haircut I received at a real barber shop (in Brevard—real barber pole, barbers, wood panelling, the works) she felt that the professionals did a much better job at it then she.

[Just your typical small town barber shop in Brevard] [But sadly, the barber shop quartet was at the Brevard Music Center performing the day I got my hair cut.]

But we live in Boca Raton, not quite your Small Town, USA™ so we had to make do with something a bit more upscale, The Man Cave. At the appointed time, I walked into the Man Cave.

“Welcome, Sean,” said the hostess.

“Um … how did you–”

“You're here for your four o'clock appointment,” she said. “Would you care for wine? Or perhaps an imported beer from the Continent?”

“Oh. Um. No, I'm fine.”

“Very well. Chel will take care of you,” she said. “Chel! You're four o'clock is here.” She pointed over to the chairs, nestled among oversized high-contrast portraits of James Dean and Marlon Brando. “This way,” she said.

“Hello,” said Chel, walking over to lead me to her chair. “Please, take a seat. Short, over the ears, close cropped shave.”

“How did—”

“Shh, just sit back and relax,” she said, tying a paper collar about my neck and adjusting the snap-on tarp. “Glasses,” she said.

“Oh, yes,” I said. I took off my glasses, and she placed them gently on the nearby counter. She then started clipping my hair. It was the typical motions—snip here, snip there, reposition my head, more snipping, use the electric razer here and there and before long, she had apparently finished with cutting my hair.

She then lowered the back of the chair so I was nearly lying horizontally. “Please, relax,” Chel said, as she lowered a folded, steaming hot towel across the lower half of my face, then raised the folded part to cover my entire face. Oddly enough, even though I could see the steam rising off of it (even without my glasses) it wasn't scalding. In fact, it felt nice. It was wisked off, then she massaged my face, then another towel, then various gels and what not were rubbed into my face, then another hot towel, then more gels and finally, the shave with an honest-to-god straight razor. That was weird. I could feel it (felt like a sharp pencil against my skin) and hear it scrap the hair off my face.

And with that, I was done.

[Clean shaven]

It was not cheap. But it was a fun experience. And certainly a different experience from a small town barber shop.


I have a sad story to tell you, it might hurt your feelings a bit

Bunny and I found ourselves at The Rock Steady Jamaican Bistro in Boca Raton for dinner. An early dinner. Or a late lunch. Or a very late brunch. Dunch, if you will.

So we were eating dunch at The Rock Steady Jamaican Bistro (the food was quite good, but you better like it jerked) and for a change, we were listening to reggae. Not because we wanted to listen to raggae, but beause we were eating at a Jamaican restaurant.

And it could have been worse—it could have been country reggae.

But we were listening to reggae, when it struck me, the reggae music—it was a song I haven't heard in a long time …

“I have a sad story to tell you.”

I was trying to place where I heard it …

“It might hurt your feelings a bit.”

It's been years … and the heavy reggae beat wasn't helping, mon.

“I stepped in the bathroom last night.”

What was it?

“And stepped in a pile of shhhhhhhhh—”

Shaving Cream! It was the Shaving Cream song! Only with a very heavy reggae beat and a thick Jamaican accent. But yes, it was the Shaving Cream song!

Incredible!


A Liar's Autobiography

Bunny and I watched “A Liar's Autobiography: The Untrue Story of Monty Python's Graham Chapman” and I must say—it's a very odd film, even by Monty Python standards. Animated, in fourteen different styles (some of it very beautifully done) with most of Monty Python doing voice work (and no, it's not Graham Chapman that isn't in it), it's a completely made up story of the life of Graham Chapman.

Well, mostly made up. It does cover his homosexuality and drinking problems. But everything else is made up. Well, except for him working for Monty Python. But short of the homosexuality, the drinking problems, and working with Monty Python, it's all made up. Except his name really is Graham Chapman.

Okay, let me start over again.

Except for his name, his homosexuality, drinking problems and working with Monty Python, the movie is completey and utterly false.

Except he did study to become a doctor.

Damn!

Well, it's a very weird film and you should watch it because it's kind of true, except when it isn't.

There.

Sunday, March 10, 2013

Visions of a future past

Ah, 70's space colony concept art (link via InstaPundit)—what's not to like?


Alternative keyboards

This association of the keyboard with the synthesizer eased its entry into the world of music, but it also placed limitations on how the instrument is played that its designers didn't intend. The limitations of the piano keyboard have been recognized since long before the synthesizer existed. The biggest problem that the keyboard has always had is that, due to the two-row layout with all of the naturals on the bottom row and all of the accidentals on the top row, the performer must usually change fingering in order to transpose a chord from one key to another. This frustrates what should be a simple operation; the guitar player playing a barred chord can transpose it simply by moving up and down the neck, but the keyboard player must keep shifting fingers around to insure that each finger hits on the correct row. The additional manual dexterity and muscle memory requirement makes learning the different keys on the piano a slow and frustrating process. From my own experience, it also introduces the temptation to use teaching shortcuts that cause the student problems later on: a common technique is to start the beginning student out learning the C-major scale, which is played all on the white keys. This introduces a sort of fear or puzzlement at the black keys—what are the for? When does one use them? And then when the teacher starts introducing other scales, the use of the black keys seems arbitrary and unsystematic, and the student gets a bit freaked out. By contrast, guitar pedagogy treats the accidentals as simply other notes in the chromatic scale, which they are, and the guitar student has relatively little trouble understanding how to play different scales and keys.

Via Hacker News (via Hacker News), Sequence 15: Alternative Keyboards

I'm actually puzzled with (musical) electronic keyboards. Sure, they have a layout like a traditional piano, and yes, the C-major scale is played with all white keys, so why can't if you decide to play, say, a F-major scale, why can't you just remap the frequency of the keys so you can still play it with all white keys? C-major the keys can go C-D-E-F-G-A-B while in F-major, they go F-G-A-B♭-C-D-E and for A♭-major they go A♭-B♭-C-D♭-E♭-F-G. The same fingering, regardless of scale.

Yes, I know you can't do this on a traditional piano, but I'm not talking about a traditional piano here. You can't change the layout of a typewriter, yet it's trivial to change the layout of a computer keyboard (used to type text)—it's a matter of changing the software and boom—you have a Colmak layout!

But short of that, I am facinated by alternative music keyboards, probably because I'm not a musician and to me, these alternative music keyboards seem to show the patterns inherent in music must better than a piano keyboard.

Monday, March 11, 2013

Paper

For all the things an iPad can do, there's still a place for paper (link via dad).

Tuesday, March 12, 2013

Flowing water, stationary

Brusspup filmed water going through a speaker running at 24Hz (link via Hacker News) and the result is incredible—the water just sits there, in mid-air. As he adjusts the sound wave, the water appears to move slowly down, then reverses itself.

It is a trick, though it's not special effects—it's all done in real time with a camera. But it's the camera that causes the effect—it's acting like a strobe light, catching the action at just the right moment. And it's still impressive though.

Wednesday, March 13, 2013

Why does “Enterprise Software” universally suck?

I work in the QA Department of The Corporation. The majority of the team is stationed in the Seattle Office whereas I am the only member of QA in The Ft. Lauderdale Office. The Seattle Office tests the actual cell phones, whereas not only do I test call processing (and even though I might complain about the Protocol Stack From Hell™, I can automate my tests—muahahahahahaha!) but I am the only person who tests call processing in The Corporation.

I bring this up because the QA Department is now using Gusty, a “Real-Time Test Management System” as it calls itself. And so far, I am seriously unimpressed with it. It's written in Flash and tries its darndest to imitate a Microsoft Windows interface, but it's a far cry from Microsoft Windows. And because it tries to imitate Microsoft Windows, it's quite Microsoft-centric.

But, because it tries so hard to be cross platform, it's almost, but not quite, cute. It screams that it was a Visual Basic application ported to Flash to sell to non-Microsoft shops. There's no way to change the font (which is borderline too small, even for me). It's hard to resize the windows. The scrolling is wonky. It's just an overall clunky user interface.

And that would be fine if it actually helped with my job. But it falls down there too. There are two main objects—requirements and testcases. A test case can have multiple requirements, and a requirement can apply to several testcases. You use different windows to create requirements and testcases.

Now, when you view the details of, say, a requirement, you can get a list of testcases it applies to, but it's a tool-tip like element—meaning, it pops up a small text window with a list of testcases. Can you click on one to go to the testcase? No. Can you select one? No. Can you copy the text with mouse? No. Does it remain up for any length of time? No. Is this list in anyway useful? No. Well … okay, you can memorize the ID before it goes away. So if you want more details on the a particular testcase for a requirement, you have to go to the testcase window and manually search for it.

Oh, there is a search facility (it was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying “Beware of the leopard”) but searching by ID doesn't work. You see, it's a text only search, and IDs being numbers, aren't text …

Yeah.

And the Microsoft-esqueness of the program means that this is really geared towards manual tests. Oh, they pay lipservice to automation and in theory, you can run automated tests, but in theory, there is no difference between theory and practice, but in practice …

In practice, you install some Java client on the machine to run the tests and somehow get this tool to run the tests. And okay, that's fine.

Only my test program runs eight programs (which spawn like a bazillion processes) on four computers, and the programs need to start in a particular order, with particular data. Somehow, I don't think the Gusty tool was built with that type of testing in mind (and when I said the tests I run are automated? Yes. They are. But the setup isn't, as there are a few steps that have security implications involving root).

Now, I'm sure that Gusty is a fine tool within certain limitations (large testing teams manually testing software using Microsoft Windows) but for what I do, it doesn't work at all.

Thankfully, I can continue with my job without having to use Gusty, as I'm practically my own department.

Thursday, March 14, 2013

Happy π Day, I guess …

It seems that several of my peeps on GoogleFacePlusBook are making some noise about it being π Day or something like that. Well … it's 3/14/13, or 3.1413, which is 0.0003 too small. I think it might be better to wait until March 14th, 2015. It's just too bad it isn't March 14th, 1592.

Friday, March 15, 2013

Rolling them bones

On GoogleFacePlusBook, Jeff linked to an article about non-transitive dice—three dice where, on average (meaning—many rolls) die A will win over die B, die B will win over die C, but die C will win over die A (kind of like rock-paper-scissors). Even weirder, if you double the dice, two A's against two B's against two C's, the order reverses! (And the accompanying video shows a series of five dice with an even weirder dual-non-transitive ordering).

This page remineded me of a set of “go-first-dice”—a set of twelve sided dice where, say, four people each have a ¼ chance of rolling the highest number, a ¼ chance of rolling the second highest number, a ¼ chance of rolling the third highest number and a ¼ chance of rolling the lowest number. In this case, the dice form a strict ordering (there is no chance of a tie).

Monday, March 18, 2013

You know, you might want to check the calendar for March 2002

Okay, I've seen this particular meme multiple times now on GoogleFacePlusBook:

THIS IS THE ONLY TIME WE WILL SEE AND LIVE THIS EVENT

Calendar for March 2013

March 2013
Sun Mon Tue Wed Thr Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31            

This year March has 5 Fridays, 5 Saturdays and 5 Sundays. This happens once every 823 years. This is called money bags. So, share this to your friends and money will arrive within 4 days. Based on Chinese “Feng Shui.” The one who does not share … will be without money.

The first time I saw it, I thought, Weird—is that true? Really? 823 years since the last time? Okay, let's write some code to answer this—which years has March 1st fallen on a Friday?

#!/usr/bin/env lua

function dayofweek(date)
  local a = math.floor((14 - date.month) / 12)
  local y = date.year - a
  local m = date.month + 12 * a - 2

  local d = date.day
          + y
          + math.floor(y / 4)  
          - math.floor(y / 100)
          + math.floor(y / 400)
          + math.floor(31 * m / 12)
  return (d % 7) + 1
end

for year = 1190,2014 do
  if dayofweek { year = year , month = 3 , day = 1 } == 6 then
    print(year)
  end
end  

1190 is 823 years ago. Let's see what we get …

1191 1196 1202 1213 1219 1224 1230 1241 
1247 1252 1258 1269 1275 1280 1286 1297 
1309 1315 1320 1326 1337 1343 1348 1354 
1365 1371 1376 1382 1393 1399 1405 1411 
1416 1422 1433 1439 1444 1450 1461 1467 
1472 1478 1489 1495 1501 1507 1512 1518 
1529 1535 1540 1546 1557 1563 1568 1574 
1585 1591 1596 1602 1613 1619 1624 1630 
1641 1647 1652 1658 1669 1675 1680 1686 
1697 1709 1715 1720 1726 1737 1743 1748 
1754 1765 1771 1776 1782 1793 1799 1805 
1811 1816 1822 1833 1839 1844 1850 1861 
1867 1872 1878 1889 1895 1901 1907 1912 
1918 1929 1935 1940 1946 1957 1963 1968 
1974 1985 1991 1996 2002 2013

(okay, I cleaned up the output a bit)

Well … it's happened 118 times since 1190, and it didn't happen 823 years ago, but 822 years ago, and 817 years ago and … you get the picture.

Hardly a rare occurrence and in thinking about it, any month with 31 days will have three days that happen five times a month (five Sundays, five Mondays and five Tuesdays for instance).

Now, I can hope this puts this particularly medium meme (it's not rare, and it's not well done) out of commission. But alas, I doubt it …

Tuesday, March 19, 2013

It was an inside job

“On that fateful day, the day that the Empire was attacked, all the crew, staff and guests including security personel and Imperial officials with the exception of one were killed when the terrorist organization known only as the ‘Rebel Alliance’ blew up the Death Star with a squad of single man fighters.”

That much is known. But really, how does a farm boy with no combat experience blow up the most fortified and heavily armed vehicle in the Empire's arsenal? You do really believe the “official story?”

Nah, it was an inside job (link via GoogleFacePlusBook).

Wednesday, March 20, 2013

Simple

You just went to the Google home page.

Simple, isn't it?

What just actually happened?

Well, when you know a bit of about how browsers work, it's not quite that simple. You've just put into play HTTP, HTML, CSS, ECMAscript, and more. Those are actually such incredibly complex technologies that they'll make any engineer dizzy if they think about them too much, and such that no single company can deal with that entire complexity.

Let's simplify.

Via Hacker News, Jean- Baptiste Queru - Google+ - Dizzying but invisible depth

It's hard to simplify how modern computers work. Sure, I could say something along the lines of “a computer consists of three components, memory, which stores a program and data, the CPU, which executes the program out of memory and input/output devices, like a keyboard, the monitor.” But while I'm not outright lying, I am grossly simplifying and skipping a lot of details (for instance, memory could be considered an input/output device, but it's treated differently than other input/output devices, except when it isn't).

For instance:

Let's say you've just bought a MacBook Air, and your goal is to become master of the machine, to understand how it works on every level.

The total of all of this is 79 pages shy of eleven thousand. I neglected to include man pages for hundreds of system utilities and the Xcode documentation. And I didn't even touch upon the graphics knowhow needed to do anything interesting with OpenGL, or how to write good C and Objective-C or anything about object-oriented design, and …

A Complete Understanding is No Longer Possible

And those 11,000 pages exclude documentation on the hardware. For comparison, the Color Computer (my first computer). The TRS-80 Color Computer Technical Reference Manual, which covers the hardware, is 69 pages; the technical reference for the MC6809 (the CPU) is 35 pages; the reference for the MC6821 (an I/O adaptor) is 11 pages; the reference for the MC6847 (the video graphics chip) is 26 pages, and the EDTASM+ manual (the assembler) is 68 pages.

So, in 209 pages, you will know enough to program the Color Computer (assuming you know how to program in assembly to begin with—else tack on another 294 pages for TRS-80 Color Computer Assembly Language Programming for a little over 500 pages).

Even “Project: Wolowizard” is rather insane, requiring knowledge of SS7, IP, HTTP, Solaris, Linux, C, C++, Javascript, Java, Lua, Ruby, Python, SQL, HTML, CSS, XML and that's just the “off-the- shelf” stuff we're using (and all the documentation for that probably exceeds 11,000 pages quite easily; there are probably over 2,000 pages just for SS7 and IP alone)—I'm not mentioning the file formats and protocols we've developed for “Project: Wolowizard” (just the test plan for one component is over 150 pages, and there's at least seven components just on the backend).

It's simple.

Thursday, March 21, 2013

A good idea in theory marred by the terrible reality of practice

I get the feeling sometimes that not enough is written about failed ideas—not bad ideas, the ones that shouldn't be done but the class of ideas that can't be done for one reason or another.

Today I had one such idea, but first, some back story.

Sometime last year, R, who runs the Ft. Lauderdale Office of The Corporation, was listening to me lament about The Protocol Stack From Hell™ and how I had this magical ability to break it by thinking bad thoughts about it (an amazing feat when you consider that the physical computers are several miles away in a data center and that any bad thoughts I had towards it had to travel over a remote command line interface).

R explained that SS7 networks are different than IP networks in that any SS7 endpoint that bounces up and down will effectively be ignored by the rest of the SS7 network (and will typically require manual intervention to re-establish a connection), so was there any way I could keep my testing program up and running.

I countered that I didn't think so, seeing how I had to test the testing program and as such, I had to stop and start the program as I found bugs in my own code while I was using it to find bugs in the code I was paid to find bugs in. R conceeded the point and that was that. I would keep doing what I was doing and if the SS7 stack on the machines needed to be restarted because I borked The Protocol Stack From Hell™ yet again, so be it.

Then today, I read about the reliability of the Tandem computer (link via programming is terrible).

Hi, is this Support? We have a problem with our Tandem: A car bomb exploded outside the bank, and the machine has fallen over … No, no it hasn't crashed, it's still running, just on its side. We were wondering if we can move it without breaking it.

Apocraphal story about a Tandem computer

[One other apocraphal story about the Tandem. About fifteen years ago I worked at a company that had a Tandem computer. It was said that one day a cooling fan for the Tandem computer just showed up at the receptionist's desk with no explaination. When she called Tandem about the apparent mistaken delivery, they said that the Tandem computer had noticed its cooling fan was marginal and had ordered a replacement fan.]

I had an idea.

I can't say exactly what triggered the idea—it just hit me.

The idea was to write a very small, and very simple program that established an SS7 endpoint—a “master control program” if you will. It would also listen in on a named pipe for commands. One command would start the testing program, passing the SS7 endpoint to the testing program to use (another command would be to stop the testing program). The SS7 endpoint that is created is a Unix file descriptor (a file descriptor is an integer value used to refer to an open file under Unix, but more importantly, we have the source code to The Protocol Stack From Hell™ and the fact that the SS7 endpoint is a file descriptor is something I can verify). Open file descriptors are inherited by child processes. Closing a file descriptor in a child process does not close it in the parent process, so the test program can crash and burn, but because the SS7 endpoint is still open in the “master control program” it's still “up” to the rest of the SS7 network.

It's a nice idea.

It won't work.

That's because the user library we use to establish an SS7 endpoint keeps static data based on the file descriptor (and no, it doesn't use the integer value as an index into an array, which would be quick—oh no, it does a linear search, multiple times for said private data—I really need a triple facepalm picture for this) and there's no way to establish this static data given an existing file descriptor.

Sigh.

Friday, March 22, 2013

Preloading Lua modules

I'm tasked with testing the call processing on “Project: Wolowizard.” M suggested, and I concurred, that using Lua to manage the testing scripts would be a Good Thing™. Easier to write and modify the tests as needed. So over the past few years I've written a number of modules to handle the files and protocols used in the project (one side effect: by re-implemeting the code to read/write the various data files helped to verify the specification and flush out architectural dependencies in the binary formats).

But one problem did exist: Not all the systems I need to run the test on have Lua installed, and LuaRocks has … um … “issues” on our Solaris boxes (otherwise, it's not that bad a package manager). So I decided to build what I call “Kitchen Sink Lua”—a Lua interpreter that has the 47 modules required to run the testing scripts (okay, eight of the modules are already built into Lua).

It took some time to wrangle, as some of the modules were written in Lua (so the source needed to be embedded) and I had to figure out how to integrate some third party modules (like LuaCURL) into the build system, but perhaps the hardest bit was to ensure the modules were initialized properly. My first attempt, while it worked (mostly by accident) wasn't technically correct (as I realized when I read this message on a mailing list).

I then restructured my code, which not only made it correct, but smaller and clearer.

#include <stdlib.h>
#include <assert.h>

#include <lua.h>
#include <lauxlib.h>
#include <lualib.h>

/**************************************************************************/

typedef struct prelua_reg
{
  const char   *const name;
  const char   *const code;
  const size_t *const size;
} prelua_reg__t;

/*************************************************************************/

int	luaopen_org_conman_env		(lua_State *);
int	luaopen_org_conman_errno	(lua_State *);
int	luaopen_org_conman_fsys		(lua_State *);
int	luaopen_org_conman_math		(lua_State *);
int	luaopen_org_conman_syslog	(lua_State *);
int	luaopen_org_conman_hash		(lua_State *);
int	luaopen_org_conman_string_trim	(lua_State *);
int	luaopen_org_conman_string_wrap	(lua_State *);
int	luaopen_org_conman_string_remchar (lua_State *);
int	luaopen_org_conman_process	(lua_State *);
int	luaopen_org_conman_net		(lua_State *);
int	luaopen_org_conman_dns		(lua_State *);
int	luaopen_org_conman_sys		(lua_State *);
int	luaopen_org_conman_uuid		(lua_State *);
int	luaopen_lpeg			(lua_State *);
int	luaopen_LuaXML_lib		(lua_State *);
int	luaopen_cURL			(lua_State *);

/***********************************************************************/

	/*---------------------------------------------------------------
	; Modules written in Lua.  The build system takes the Lua code,
	; processes it through luac (the Lua compiler), then creates an
	; object file which exports a character array containing the byte
	; code, and a variable which gives the size of the bytecode array.
	;---------------------------------------------------------------*/

extern const char   c_org_conman_debug[];
extern const size_t c_org_conman_debug_size;
extern const char   c_org_conman_getopt[];
extern const size_t c_org_conman_getopt_size;
extern const char   c_org_conman_string[];
extern const size_t c_org_conman_string_size;
extern const char   c_org_conman_table[];
extern const size_t c_org_conman_table_size;
extern const char   c_org_conman_unix[];
extern const size_t c_org_conman_unix_size;
extern const char   c_re[];
extern const size_t c_re_size;
extern const char   c_LuaXml[];
extern const size_t c_LuaXml_size;

	/*----------------------------------------------------------------
	; Modules written in C.  We can use luaL_register() to load these
	; into package.preloaded[]
	;----------------------------------------------------------------*/

const luaL_Reg c_preload[] =
{
  { "org.conman.env"		, luaopen_org_conman_env		} ,
  { "org.conman.errno"		, luaopen_org_conman_errno		} ,
  { "org.conman.fsys"		, luaopen_org_conman_fsys		} ,
  { "org.conman.math"		, luaopen_org_conman_math		} ,
  { "org.conman.syslog"		, luaopen_org_conman_syslog		} ,
  { "org.conman.hash"		, luaopen_org_conman_hash		} ,
  { "org.conman.string.trim"	, luaopen_org_conman_string_trim	} ,
  { "org.conman.string.wrap"	, luaopen_org_conman_string_wrap	} ,
  { "org.conman.string.remchar"	, luaopen_org_conman_string_remchar	} ,
  { "org.conman.process"	, luaopen_org_conman_process		} ,
  { "org.conman.net"		, luaopen_org_conman_net		} ,
  { "org.conman.dns"		, luaopen_org_conman_dns		} ,
  { "org.conman.sys"		, luaopen_org_conman_sys		} ,
  { "org.conman.uuid"		, luaopen_org_conman_uuid		} ,
  { "lpeg"			, luaopen_lpeg				} ,
  { "LuaXML_lib"		, luaopen_LuaXML_lib			} ,
  { "cURL"			, luaopen_cURL				} ,
  { NULL			, NULL					}
};

	/*---------------------------------------------------------------
	; Modules written in Lua.  These need to be loaded and populated
	; into package.preloaded[] by some code provided in this file.
	;----------------------------------------------------------------

const prelua_reg__t c_luapreload[] =
{
  { "org.conman.debug"		, c_org_conman_debug	, &c_org_conman_debug_size	} ,
  { "org.conman.getopt"		, c_org_conman_getopt	, &c_org_conman_getopt_size	} ,
  { "org.conman.string"		, c_org_conman_string	, &c_org_conman_string_size	} ,
  { "org.conman.table"		, c_org_conman_table	, &c_org_conman_table_size	} ,
  { "org.conman.unix"		, c_org_conman_unix	, &c_org_conman_unix_size	} ,
  { "re"			, c_re			, &c_re_size			} ,
  { "LuaXml"			, c_LuaXml		, &c_LuaXml_size		} ,
  { NULL			, NULL			, NULL				}
};

/*************************************************************************/

void preload_lua(lua_State *const L)
{
  assert(L != NULL);
  
  lua_gc(L,LUA_GCSTOP,0);
  luaL_openlibs(L);
  lua_gc(L,LUA_GCRESTART,0);
  
  /*---------------------------------------------------------------
  ; preload all the modules.  This does does not initialize them, 
  ; just makes them available for require().  
  ;
  ; I'm doing it this way because of a recent email on the LuaJIT
  ; email list:
  ;
  ; http://www.freelists.org/post/luajit/Trivial-bug-in-bitops-bitc-luaopen-bit,4
  ;
  ; Pre-loading these modules in package.preload[] means that they're be
  ; initialized properly through the require() statement.
  ;---------------------------------------------------------------------*/
  
  lua_getglobal(L,"package");
  lua_getfield(L,-1,"preload");
  
  luaL_register(L,NULL,c_preload);
  for (size_t i = 0 ; c_luapreload[i].name != NULL ; i++)
  {
    int rc = luaL_loadbuffer(L,c_luapreload[i].code,*c_luapreload[i].size,c_luapreload[i].name);
    if (rc != 0)
    {
      const char *err;
      
      switch(rc)
      {
        case LUA_ERRRUN:    err = "runtime error"; break;
        case LUA_ERRSYNTAX: err = "syntax error";  break;
        case LUA_ERRMEM:    err = "memory error";  break;
        case LUA_ERRERR:    err = "generic error"; break;
        case LUA_ERRFILE:   err = "file error";    break;
        default:            err = "unknown error"; break;
      }
      
      fprintf(stderr,"%s: %s\n",c_luapreload[i].name,err);
      exit(EXIT_FAILURE);
    }
    lua_setfield(L,-2,c_luapreload[i].name);
  }
}

/*************************************************************************/

Yes, this is the code used in “Project: Wolowizard” (minus the proprietary modules) and is a good example of the module preload feature in Lua. The modules in C are easy to build (the following is from the Makefile):

obj/spc/process.o : $(LUASPC)/src/process.c     \
                $(LUA)/lua.h                    \
                $(LUA)/lauxlib.h
        $(CC) $(CFLAGS) -I$(LUA) -c -o $@ $<

While the Lua-based modules are a bit more involved:

obj/spc/unix.o : $(LUASPC)/lua/unix.lua $(BIN2C) $(LUAC)
        $(LUAC) -o tmp/unix.out $<
        $(BIN2C) -o tmp/unix.c -t org_conman_unix tmp/unix.out
        $(CC) $(CFLAGS) -c -o $@ tmp/unix.c

These modules are compiled using luac (which outputs the Lua byte code used by the core Lua VM), then through a program that converts this output into a C file, which is then compiled into an object file that can be linked into the final Kitchen Sink Lua interpreter.


Musings on the Current Work Project Du jour

So I have this Lua code that implements the cellphone end of a protocol used in “Project: Wolowizard.” I need to ramp up the load testing on this portion of the project so I'm looking at what I have and trying to figure out how to approach this project.

The protocol itself is rather simple—only a few messages are defined and the code is rather straightforward. It looks something like:

-- Pre-define these
state_receive = function(phone,socket) end
state_msg1    = function(phone,socket,remote,msg) end
state_msg2    = function(phone,socket,remote,msg) end

-- Now the code

state_receive = function(phone,socket)
  local remote,packet,err = socket:read()
  if err ~= 0 then
    syslog('err',string.format("error reading socket: %s",errno[err]))
    return state_receive(phone,socket)
  end

  local msg,err = sooperseekritprotocol.decode(packet)
  if err ~= 0 then
    syslog('err',string.format("error decoding: %s",decoderror(err))
    return state_receive(phone,socket)
  end

  if msg.type == 'MSG1" then
    return state_msg1(phone,socket,remote,msg)
  elseif msg.type == "MSG2" then
    return state_msg2(phone,socket,remote,msg)
  else
    syslog('warn',string.format("unknown message: %s",msg.type))
    return state_receive(phone,socket)
  end
end

state_msg1 = function(phone,socket,remote,msg)
  local reply = ... -- code to handle this msg
  local packet = sooperseekritprotocol.encode(reply)
  socket:write(remote,packet)
  return state_receive(phone,socket)
end

state_msg2 = function(phone,socket,remote,msg)
  local reply = ... -- code to andle this msg
  local packet = sooperseekritprotocol.encode(reply)
  socket:write(remote,packet)
  return state_receive(phone,socket)
end

Don't worry about this code blowing out the call stack—Lua optimizes tail calls and these effectively become GOTOs. I found this feature to be very useful in writing protocol handlers since (in my opinion) it makes the state machine rather explicit.

Now, to speed this up, I could translate this to C. As I wrote the Lua modules for The Kitchen Sink Lua interpreter, I pretty much followed a bi-level approach. I have a C interface (to be used by C code) which is then mimicked in Lua. This makes translating the Lua code into C more or less straightforward (with a bit more typing because of variable declarations and what not).

But here, I can't rely on the C compiler to optimize tail calls (GCC can, but only with certain options; I don't know about the Solaris C compiler). I could have the routines return the next function to call and use a loop:

while((statef = (*statef)(phone,sock,&remote,&msg) != NULL)
  /* the whole state machine is run in the previous line;

But just try to define the type of statef so the compiler doesn't complain about a type mismatch. It needs to define a function that takes blah and returns a function that takes blah and returns a function that takes blah and returns a function that … It's one of those recurisive type definitions that produce headaches when you think too much about it.

Okay, so instead, let's just have a function that returns a simple integer value that represents the next state. That's easier to define and the main driving loop isn't that bad:

while(state != DONE)
{
  switch(state)
  {
    case RECEIVE: state = state_receive(phone,socket,&remote,&msg); break;
    case MSG1:    state = state_msg1(phone,socket,&remote,&msg); break;
    case MSG2:    state = state_msg2(phone,socket,&remote,&msg); break;
    default:      assert(0); break;
  }
}

Okay, with that out of the way, we can start writing the C code.

Clackity-clackity-clack clackity-clack clack clack clackity-clackity-clackity-clack clack clack clack clack …

Man, that's boring drudgework. Okay, let's just use the Lua code and maybe throw some additional threads at this. I don't think that's a bad approach. Now, Lua, out of the box, isn't exactly thread-safe. Sure, you can provide an implemention of lua_lock() and lua_unlock() but that might slow Lua down quite a bit (there are 62 locations where the lock could be taken in the Lua engine). We could give each thread its own Lua state—how bad could that be?

How big is a Lua state? Let's find out, shall we?

#include <stdio.h>
#include <stdlib.h>
#include <lua.h>
#include <lauxlib.h>

int main(void)
{
  lua_State *L;

  L = luaL_newstate();
  if (L == NULL)
  {
    perror("luaL_newstate()");
    return EXIT_FAILURE;
  }
   
  printf("%d\n",lua_gc(L,LUA_GCCOUNT,0) * 1024);
  lua_close(L);
  return EXIT_SUCCESS;
} 

When compiled and run, this returns 2048, the amount of memory used in an empty Lua state. That's not bad at all, but that's an empty state. What about a more useful state, like the one you get when you run the stock Lua interpreter?

-- ensure any accumulated garbage is reclaimed
collectgarbage('collect')
collectgarbage('collect')
collectgarbage('collect')
print(collectgarbage('count') * 1024)

Okay, when I run this, I get 17608. Eh … it's not that bad per thread (and I do have to remind myself—this is not running on my Color Computer with 16,384 bytes of memory). But I'm not running the stock Lua interpreter, I'm running the Kitchen Sink Lua with all the trimmings—how big is that state?

I run the above Lua code and I get 4683963.

Four and a half megs!

Ouch.

I suppose if it becomes an issue, I could always go back to writing C …

Saturday, March 23, 2013

Preloading Lua modules, part II

Well, four and a half megs per Lua state in the Kitchen Sink Lua interpreter. I thought about it, and I had Yet Another Idea™.

Lua not only has an array for preloaded modules, but an array of functions used to locate and load modules. So the idea I had was to insert two custom load functions—one to search for C based Lua modules, and one for Lua-based Lua modules. The code is pretty much straight forward:

#include <stdlib.h>
#include <string.h>
#include <assert.h>

#include <lua.h>
#include <lauxlib.h>
#include <lualib.h>

/**************************************************************************/

typedef struct prelua_reg
{
  const char   *const name;
  const char   *const code;
  const size_t *const size;
} prelua_reg__t;

/*************************************************************************/

int	luaopen_org_conman_env		(lua_State *);
int	luaopen_org_conman_errno	(lua_State *);
int	luaopen_org_conman_fsys		(lua_State *);
int	luaopen_org_conman_math		(lua_State *);
int	luaopen_org_conman_syslog	(lua_State *);
int	luaopen_org_conman_hash		(lua_State *);
int	luaopen_org_conman_string_trim	(lua_State *);
int	luaopen_org_conman_string_wrap	(lua_State *);
int	luaopen_org_conman_string_remchar (lua_State *);
int	luaopen_org_conman_process	(lua_State *);
int	luaopen_org_conman_net		(lua_State *);
int	luaopen_org_conman_dns		(lua_State *);
int	luaopen_org_conman_sys		(lua_State *);
int	luaopen_org_conman_uuid		(lua_State *);
int	luaopen_lpeg			(lua_State *);
int	luaopen_LuaXML_lib		(lua_State *);
int	luaopen_cURL			(lua_State *);

/***********************************************************************/

extern const char   c_org_conman_debug[];
extern const size_t c_org_conman_debug_size;
extern const char   c_org_conman_getopt[];
extern const size_t c_org_conman_getopt_size;
extern const char   c_org_conman_string[];
extern const size_t c_org_conman_string_size;
extern const char   c_org_conman_table[];
extern const size_t c_org_conman_table_size;
extern const char   c_org_conman_unix[];
extern const size_t c_org_conman_unix_size;
extern const char   c_re[];
extern const size_t c_re_size;
extern const char   c_LuaXml[];
extern const size_t c_LuaXml_size;

const luaL_Reg c_preload[] =
{
  { "LuaXML_lib"		, luaopen_LuaXML_lib			} ,
  { "cURL"			, luaopen_cURL				} ,
  { "lpeg"			, luaopen_lpeg				} ,
  { "org.conman.dns"		, luaopen_org_conman_dns		} ,
  { "org.conman.env"		, luaopen_org_conman_env		} ,
  { "org.conman.errno"		, luaopen_org_conman_errno		} ,
  { "org.conman.fsys"		, luaopen_org_conman_fsys		} ,
  { "org.conman.hash"		, luaopen_org_conman_hash		} ,
  { "org.conman.math"		, luaopen_org_conman_math		} ,
  { "org.conman.net"		, luaopen_org_conman_net		} ,
  { "org.conman.process"	, luaopen_org_conman_process		} ,
  { "org.conman.string.remchar"	, luaopen_org_conman_string_remchar	} ,
  { "org.conman.string.trim"	, luaopen_org_conman_string_trim	} ,
  { "org.conman.string.wrap"	, luaopen_org_conman_string_wrap	} ,
  { "org.conman.sys"		, luaopen_org_conman_sys		} ,
  { "org.conman.syslog"		, luaopen_org_conman_syslog		} ,
  { "org.conman.uuid"		, luaopen_org_conman_uuid		} ,
};

#define MAX_CMOD (sizeof(c_preload) / sizeof(luaL_Reg))

const prelua_reg__t c_luapreload[] =
{
  { "LuaXml"			, c_LuaXml		, &c_LuaXml_size		} ,
  { "org.conman.debug"		, c_org_conman_debug	, &c_org_conman_debug_size	} ,
  { "org.conman.getopt"		, c_org_conman_getopt	, &c_org_conman_getopt_size	} ,
  { "org.conman.string"		, c_org_conman_string	, &c_org_conman_string_size	} ,
  { "org.conman.table"		, c_org_conman_table	, &c_org_conman_table_size	} ,
  { "org.conman.unix"		, c_org_conman_unix	, &c_org_conman_unix_size	} ,
  { "re"			, c_re			, &c_re_size			} ,
};

#define MAX_LMOD (sizeof(c_luapreload) / sizeof(prelua_reg__t))

/*************************************************************************/

static int luaLReg_cmp(const void *needle,const void *haystack)
{
  const char     *key   = needle;
  const luaL_Reg *value = haystack;
  
  return (strcmp(key,value->name));
}

/*************************************************************************/

static int preloadlua_cloader(lua_State *const L)
{
  const char     *key;
  const luaL_Reg *target;
  
  key    = luaL_checkstring(L,1);
  target = bsearch(key,c_preload,MAX_CMOD,sizeof(luaL_Reg),luaLReg_cmp);
  if (target == NULL)
    lua_pushnil(L);
  else
    lua_pushcfunction(L,target->func);
  return 1;
}

/************************************************************************/

static int preluareg_cmp(const void *needle,const void *haystack)
{
  const char          *key   = needle;
  const prelua_reg__t *value = haystack;
  
  return (strcmp(key,value->name));
}

/*************************************************************************/

static int preloadlua_lualoader(lua_State *const L)
{
  const char          *key;
  const prelua_reg__t *target;
  
  key = luaL_checkstring(L,1);
  target = bsearch(key,c_luapreload,MAX_LMOD,sizeof(prelua_reg__t),preluareg_cmp);
  if (target == NULL)
    lua_pushnil(L);
  else
  {
    int rc = luaL_loadbuffer(L,target->code,*target->size,target->name);
    if (rc != 0)
      lua_pushnil(L);
  }
  return 1;
}

/***********************************************************************/

void preload_lua(lua_State *const L)
{
  assert(L != NULL);
  
  lua_gc(L,LUA_GCSTOP,0);
  luaL_openlibs(L);
  lua_gc(L,LUA_GCRESTART,0);
  
  /*---------------------------------------------------------------
  ; modify the package.loaders[] array to include two new searchers:
  ;
  ; 1) scan for a C based module, return luaopen_*()
  ; 2) scan for a Lua based module, return the result of luaL_loadbuffer()
  ;---------------------------------------------------------------------*/
  
  lua_getglobal(L,"package");
  lua_getfield(L,-1,"loaders");
  
  int len = lua_objlen(L,-1);
  
  /*-----------------------------------------------------------------------
  ; insert the two new functions at the start of the package.loaders[]
  ; array---this way, we get first crack at loading the modules and don't
  ; waste time with expensive disk lookups.
  ;----------------------------------------------------------------------*/

  for (int i = len + 2 ; i > 3 ; i--)
  {
    lua_rawgeti(L,-1,i - 2);
    lua_rawseti(L,-2,i);
  }
  
  lua_pushinteger(L,1);
  lua_pushcfunction(L,preloadlua_cloader);
  lua_settable(L,-3);
  
  lua_pushinteger(L,2);
  lua_pushcfunction(L,preloadlua_lualoader);
  lua_settable(L,-3);    
}

And a quick test of the new Kitchen Sink Lua interpeter on this:

-- ensure any accumulated garbage is reclaimed
collectgarbage('collect')
collectgarbage('collect')
collectgarbage('collect')
print(collectgarbage('count') * 1024)

reveals a nice usage of 17,618 bytes—on par with the stock Lua interpreter. What's happening here is that the modules are no longer being shoved into the Lua state regardless of use (and there's one module that accounts for about 3½ megabytes—it's rarely used, but I do need it in some circumstances); they're now loaded into the Lua state as needed.

This also lead me to the concept of compressing the Lua written modules with zlib to save space in the executable (and it does help). I'll leave that code to the reader as an exercise.

Interestingly enough, the only hard part of this was trying to figure out how to insert two elements at the start of an array using the C API of Lua—there is no equivalent function to the Lua table.insert() function. I resorted to checking the source code to table.insert() to see how it was done.

The only other problem I had was debugging the zlib-based version of this code—a typo (two missing characters—sigh) lead me on a multi-hour bug chase.

But it works now, and I've decreased memory usage quite a bit with some few simple changes to the code, which is always nice.

Update on Wednesday, March 22nd, 2023

Part III

Monday, March 25, 2013

On tail call optimization in certain C compilers

From
Mark Grosberg <XXXXXXXXXXXXXXXXX>
To
Sean Conner <sean@conman.org>
Subject
Tail calls.
Date
Sun, 24 Mar 2013 12:35:18 PM -0500

But here, I can't rely on the C compiler to optimize tail calls (GCC can, but only with certain options; I don't know about the Solaris C compiler). I could have the routines return the next function to call and use a loop:

I haven't verified it but it probably does. It's a pretty easy optimization (compared to what compilers do today) so I'd be surprised if the Sun C compiler doesn't handle this. At least for C code (C++ exceptions can throw some wrinkles in this in some cases).

-MYG

I decided to check. And yes, the Solaris C compiler does support tail call optimizations. So I figured I would play around with this a bit, both under gcc and the Solaris C compiler.

Final results: gcc and the Solaris C compiler both support tail call optimizations (some restrictions apply; void where prohibited; your mileage may vary; results presented are not typical and yours might vary; use only as directed; this program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE; do not taunt Happy Fun Ball).

First off, the number of parameters must match among the functions; the types don't appear to matter that much, just the number. Second, the number (or size) of locally defined variables also matters. I'm not sure what the upper size (or number) for variables is (and it may differ between the two compilers) but it does appear to be a factor. Third, the only safe way to determine if tail call optimizations are being performed is to check the assembly code and check for calls (or, just run it and see if it crashes after a period of time).

So I can, kind of, assume tail call optimization in C code. It'll be something to keep in mind.

Tuesday, March 26, 2013

I wonder what IPO I'll be invited to this time?

I need to check what I used as a certain field in my Lua unix module so I thought I would do this through the Lua interpreter:

[spc]saltmine:~>lua
Lua 5.1.4  Copyright (C) 1994-2008 Lua.org, PUC-Rio
> unix = require "org.conman.unix"
lua: src/env.c:54: luaopen_org_conman_env: Assertion `value != ((void *)0)' failed.
Aborted (core dumped)
[spc]saltmine:~>

What the … um … what's going on with that code?

int luaopen_org_conman_env(lua_State *L)
{
  luaL_register(L,"org.conman.env",env);

  for (int i = 0 ; environ[i] != NULL ; i++)
  {
    char   *value;
    char   *eos;

    value = memchr(environ[i],'=',(size_t)-1);
    assert(value != NULL);
    eos   = memchr(value + 1,'\0',(size_t)-1);
    assert(eos   != NULL);

    lua_pushlstring(L,environ[i],(size_t)(value - environ[i]));
    lua_pushlstring(L,value + 1,(size_t)(eos - (value + 1)));
    lua_settable(L,-3);
  }

  return 1;
}

No! It can't be! Really?

value = memchr(environ[i],'=',10000);
[spc]saltmine:~>lua
Lua 5.1.4  Copyright (C) 1994-2008 Lua.org, PUC-Rio
> unix = require "org.conman.unix"
> 

Yup. It can be! Really!

XXXX! I encountered this very same bug fifteen years ago! The GNU C library, 64 bit version.

Back then, the maintainers of the GNU were making an assumption that any value above some already ridiculously large value was obviously bad and returning NULL, not even bothering to run memchr(). But I was using a valid value.

You see, I have a block of data I know an equal sign exists in. If it doesn't exist, I have bigger things to worry about (like I'm not in Kansas a POSIX environment anymore). But I don't know how much data to look through. And instead of just assuming a “large enough value” (which may be good enough for today, but then again, 640K was enough back in the day) I decided to use a value, that converted to a size_t type, basically translates to “all of memory”.

And on a 32-bit system, it worked fine. But on the GNU C library, 64-bit version, it failed, probably because the maintainers felt that 18,446,744,073,709,551,615 bytes is just a tad silly to search through.

And the only reason I remember this particular bug, is because it apparently was enough to get me invited to the RedHat IPO (it was either that, or my work on porting pfe to IRIX back in the mid-90s).

I did a bit more research (basically—I tried two 64-bit Linux distributions) and I found a really odd thing—glibc version 2.3 does not exhibit the behavior (meaning, my code works on a version released in 2007) but crashes under 2.12 (the code changed sometime between 2007 and 2010).

Sigh. Time to investigate if this is still a problem in 2.17 and if so, report it as a bug …

Thursday, March 28, 2013

The end result was a computer producing vast amounts of nothing very slowly

So, I run this loadtest program on my work computer. It's going, I can see the components I'm testing registering events (via the realtime viewer I wrote for syslogintr). Everything is going fine … and … … then … … … t … h … e … … c … o … m … p … … u … … … t … … … … e … … … … … r … … … … … … … … s … … … … … … … … … … l … … … … … … … … … … … … … … … o … … … … … … … … … … … … w … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … s …

It takes about ten minutes to type and run, but this:

[spc]saltmine:~>uptime
 14:44:20 up 6 days, 23:12, 10 users,  load average: 2320.45, 1277.98, 546.61

was quite amusing to see (usually the load average is 0). Perhaps it was just a tad ambitious to simulate 10,000 units on the work computer (each unit its own thread, running a Lua script—yes, even after the modifications to the Lua interpreter).

Also amusing was this:

[spc]saltmine:~>free
             total       used       free     shared    buffers     cached
Mem:       3910888     640868    3270020          0      35568     185848
-/+ buffers/cache:     419452    3491436
Swap:     11457532     544260   10913272

Yes, eleven gigabytes of memory were shoved out to the disk, so most of the slowless was due to thrashing.

Perhaps I should find some fellow cow-orker's computer to run this on …

Friday, March 29, 2013

A meta configure file

I'm tired of changing the configuration files as I test under different systems. It also seemed silly that I needed to replicate the configuration files for each system (or set of systems). What I wanted was a configuration for the configuration and that notion set off an alarm bell in my head.

The poster child for the “a configuration file for the configuration file” is sendmail, the only program I know of that has a thousand page tome dedicated to describing the configuration file, and it's little wonder when the syntax makes Perl look sane:

# try UUCP traffic as a local address
R$* < @ $+ . UUCP > $*          $: $1 < @ $[ $2 $] . UUCP . > $3
R$* < @ $+ . . UUCP . > $*      $@ $1 < @ $2 . > $3

# hostnames ending in class P are always canonical
R$* < @ $* $=P > $*             $: $1 < @ $2 $3 . > $4
R$* < @ $* $~P > $*             $: $&{daemon_flags} $| $1 < @ $2 $3 > $4
R$* CC $* $| $* < @ $+.$+ > $*  $: $3 < @ $4.$5 . > $6
R$* CC $* $| $*                 $: $3
# pass to name server to make hostname canonical
R$* $| $* < @ $* > $*           $: $2 < @ $[ $3 $] > $4
R$* $| $*                       $: $2

# local host aliases and pseudo-domains are always canonical
R$* < @ $=w > $*                $: $1 < @ $2 . > $3
R$* < @ $=M > $*                $: $1 < @ $2 . > $3
R$* < @ $={VirtHost} > $*       $: $1 < @ $2 . > $3
R$* < @ $* . . > $*             $1 < @ $2 . > $3

It's so bad that there does indeed exist a configuration file for sendmail.cf that's not ugly in a “line noise” way, but ugly in a “needlessly verbose” way:

include(`/usr/share/sendmail-cf/m4/cf.m4')dnl
VERSIONID(`setup for Red Hat Linux')dnl
OSTYPE(`linux')dnl
dnl #
dnl # default logging level is 9, you might want to set it higher to
dnl # debug the configuration
dnl #
dnl define(`confLOG_LEVEL', `9')dnl
dnl #
dnl # Uncomment and edit the following line if your outgoing mail needs to
dnl # be sent out through an external mail server:
dnl #
dnl define(`SMART_HOST',`smtp.your.provider')
dnl #
define(`confDEF_USER_ID',``8:12'')dnl
dnl define(`confAUTO_REBUILD')dnl
define(`confTO_CONNECT', `1m')dnl
define(`confTRY_NULL_MX_LIST',true)dnl
define(`confDONT_PROBE_INTERFACES',true)dnl
define(`PROCMAIL_MAILER_PATH',`/usr/bin/procmail')dnl
define(`ALIAS_FILE', `/etc/aliases')dnl
define(`STATUS_FILE', `/var/log/mail/statistics')dnl
define(`UUCP_MAILER_MAX', `2000000')dnl
define(`confUSERDB_SPEC', `/etc/mail/userdb.db')dnl
define(`confPRIVACY_FLAGS', `authwarnings,novrfy,noexpn,restrictqrun')dnl
define(`confAUTH_OPTIONS', `A')dnl

My thought was (and still is) if a configuration file needs a configuration file, you're doing it wrong. So yes, I was experiencing some cognitive dissonance with writing a configuriation file for a configuration file.

But on second thought, I'm not configuring a single configuration file, I'm configuring multiple configuration files. And no, it's not one configuration file (with changes for different systems) but several configuration files, all of which need to be changed for a different system. And not only changed, but that the changes are consistent with each other—that component P is configured with the IP address of component W, and that W has the IP address of component P. And in that view, I feel better with having a configuration file for the configuration files.

Another factor to keep in mind is that I'm reading in the sample configuration file (they're in XML so parsers are readily available) from the source repository and then making changes (directly to the in-memory DOM) and saving the results to a new file. That way, I'm sure to get the latest and greatest version of the configuration file (they do change, but it's rare and it can be easy to miss, like what happened in production about two weeks ago)—most of the contents are sensible defaults.

If this sounds like I'm trying to justify an approach, I am. I still dislike “configuration files for a configuration file” and I needed to convince myself that I'm not doing something nasty in this case.

Yes, I might be overthinking this a tad bit. But then again, trying to ensure six different components are consistently configured by using a single configuration file might make this approach A Good Thing™.

Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.