Friday, March 01, 2013
Parsing—it's not just for compilers anymore
I've been playing around with LuaRocks and while I've made a rock of all my modules, I've been thinking that it would be better if I made the modules individual rocks. That way, you can install just the modules you want (perhaps you want to embed a C compiler in your Lua program) instead of a bunch of modules most of which you won't use.
And that's fine. But I like the ability to pull the source code right
out of the repository when making a rock. Now, given that the majority of
my modules are single files (either in Lua or C) and the fact that it's difficult to
checkout a single file with git
(or with svn
for
that matter) I think I'd be better served having each module be its own
repository.
And that's fine, but now I have a larger problem—how do I break out the individual files into their own repositories and keep the existing revision history? This doesn't seem to be an easy problem to solve.
Sure, git
now has the concept of
“submodules”—external repositories referenced in an existing repository,
but that doesn't help me here (and git
's handling of
“submodules” is quirky at best). There's git-filter-branch
but that's if I want to break a directory into its own repository, not a
single file. But there's also git-fast-export
, which dumps an
existing repository in a text format, supposedly to help export repositories
into other version control systems.
I think I can work with this.
The resulting output is
simple and easy to parse, so my thought is to only look at bits
involving the file I'm interested in, and generating a new file that can
then be imported into a fresh resposity with
git-fast-import
.
I used LPeg to parse the
exported output (why not? The git
export format is documented
with BNF, which is directly
translatable into Lpeg), and the only difficult portion was handling this
bit of syntax:
'data' SP <count> LF <raw> LF?
A datablock contains the number of bytes to read starting with the next line. Defining this in LPeg took some thinking. An early approach was something like:
data = Ct( -- return parse results in table P'data ' -- match 'data' SP * Cg(R"09"^1,'size') -- get size, save for later reference * P'\n' -- match LF * Cg( -- named capture P(tonumber(Cb('size'))) -- of 'size' bytes characters ,'data' -- store as 'data' ) * P'\n'^-1 -- parse optional LF )
lpeg.P(n)
states that it matchs n
characters,
but in my case, n
wasn't constant. You can do named captures,
so I figured I could capture the size, then retrieve it by name, passing the
value to lpeg.P()
, but no, that didn't work. It generates
“bad argument #1 to 'P' (lpeg-pattern expected, got nil)”—in other
words, an error.
It took quite a bit of playing around, and close reading of the LPeg manual before I found the solution:
function immdata(subject,position,capture) local size = tonumber(capture) local range = position + size - 1 local data = subject:sub(position,range) return range,data end data = Ct( P'data ' * Cg(Cmt(R"09"^1 * P"\n",immdata),'data') * P'\n^-1 )
It's the lpeg.Cmt()
that does it. It calls the given
function as soon as the given pattern is matched. The function is given the
entire object being parsed (one huge string, in this case the
subject
parameter), the position after the match (the
position
parameter), and the actual string that was matched
(the capture
parameter). From there, we can parse the size
(tonumber()
, a standard Lua functionm, ignores the included
line feed character), then we return what we want as the capture (the
variable amount of data) and the new position where LPeg should resume
parsing.
And this was the hardest part of the entire project, trying to
match a variable number of unknown characters. Once I had this, I could
read the exported respository into memory, find the parts relating to an
individual file and generate output that had the history of that one file
(excluding the bits where the file may have moved from directory to
directory—those wheren't needed) which could then be imported into a clean
git
repository.
Saturday, March 02, 2013
Stupid GitHub tricks
All that work parsing
git
repositories was for naught—it seems you can
link to individual files on GitHub (go
figure!). So now I can create a rockspec
per module, like:
package = "org.conman.tcc" version = "1.0.0-0" source = { url = "https://raw.github.com/spc476/lua-conmanorg/1.0.0/src/tcc.c" } description = { homepage = "http://...", maintainer = "Sean Conner <sean@conman.org>", license = "LGPL", summary = "Lua wrapper for TCC", detailed = [[ Blah blah blah ]] } dependencies = { "lua ~> 5.1" } external_dependencies = { TCC = { header = "libtcc.h" } } build = { type = "builtin", copy_directories = {}, modules = { ['org.conman.tcc'] = { sources = { 'tcc.c' }, libraries = { "tcc" }, } }, variables = { CC = "$(CC) -std=c99", CFLAGS = "$(CFLAGS)", LFLAGS = "$(LIBFLAG)", LUALIB = "$(LIBDIR)" } }
(this particular module embeds a C compiler in Lua, which is something I do need to talk about).
But it isn't like I wasted time on this. No, I don't view it that way at
all. In fact, I learned a few things—how to parse git
repositories, how to parse a variable amount of data in LPeg and I have code to
extract a single file into its own git
repository if I ever
have that need again.
Sunday, March 03, 2013
I mean, Washington may be a wretched hive of scum and villiany, but …
- From
- "Agent Chris Swecker"<laura@nwclbj.com>
- To
- undisclosed-recipients:;
- Subject
- *****SPAM***** UNITED STATES DEPARTMENT OF JUSTICE
- Date
- Sun, 3 Mar 2013 03:38:06 +0800
Spam detection software, running on the system “DiskStation”, has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see admin for details.
Content preview: Federal Bureau of Investigation (FBI) Counter-terrorism Division and Cyber Crime Division J. Edgar. Hoover Building Washington DC Dear Beneficiary, Series of meetings have been held over the past 7 months with the secretary general of the United Nations Organization. This ended 3 days ago. It is obvious that you have not received your fund which is to the tune of $8.5,000.000.00 due to past corrupt Governmental Officials who almost held the fund to themselves for their selfish reason and some individuals who have taken advantage of your fund all in an attempt to swindle your fund which has led to so many losses from your end and unnecessary delay in the receipt of your fund. […]
Content analysis details: (6.9 points, 5.0 required) pts rule name description -1.4 ALL_TRUSTED Passed through trusted hosts only via SMTP 1.8 SUBJ_ALL_CAPS Subject is all capitals 0.0 MONEY_BACK BODY: Money back guarantee 0.0 HTML_MESSAGE BODY: HTML included in message 1.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts 0.0 FORGED_OUTLOOK_TAGS Outlook can't send HTML in this format 0.7 MSOE_MID_WRONG_CASE MSOE_MID_WRONG_CASE 0.0 FORGED_OUTLOOK_HTML Outlook can't send HTML message only 4.2 FORGED_MUA_OUTLOOK Forged mail pretending to be from MS Outlook The original message was not completely plain text, and may be unsafe to open with some email clients; in particular, it may contain a virus, or confirm that your address can receive spam. If you wish to view it, it may be safer to save it to a file and open it with an editor.
You know … if you're going to try a Nigerian 419 scam, it might be wise to ensure your email isn't flagged as spam on the way out of your own email server! I'm just saying …
Monday, March 04, 2013
Ideas in parsing the command line
For any non-trivial script, even for personal consumption, it's necessary to supply usage text. The novelty of Lapp is that it starts from that point and defines a loose format for usage strings which can specify the names and types of the parameters.
An example will make this clearer:
-- scale.lua require 'lapp' local args = lapp [[ Does some calculations -o,--offset (default 0.0) Offset to add to scaled number -s,--scale (number) Scaling factor <number> (number ) Number to be scaled ]] print(args.offset + args.scale * args.number)
lua-users wiki: Lapp Framework
The thought of parsing the usage text for parsing the command line never occured to me, and I think it's brilliant.
Now, when I want to modify the command line of a program I wrote (and this is mostly in C, by the way), there are four locations I have to edit:
- An enumeration specifying the “short” form of the command line option
- A structure describing both the short and the long forms of a command line option
- A switch statement that processes the command line options from
getopt_long()
- The text printed out describing the command line options
This method though, there's only one area I would have to edit.
Now granted, this is only for Lua, but I can't see why something similar for other languages can't be done.
An unholy mashup
I know some of my college friends will find this herectical, but what happens when you take Carly Rae Jepsen's “Call Me Maybe” and mix it with Nine Inch Nails' “Head Like A Hole”?
Oddly enough, you get something that works (link via GoogleFacePlusBook), kind of, maybe, in that “train-wreck cross-genre musical mashup” kind of way.
Tuesday, March 05, 2013
Papers please
I've seen Constitutional Free Zone map multiple times, and I've tried to substantiate the claims made by the ACLU but everything I've found so far points back to the ACLU; I've yet to find anything that doesn't point back there.
But then I saw this video of DHS checkpoint refusals (link via Flutterby), all of which took place inside the United States (and not at the border), one of which was at least 30 miles from the border. So perhaps there is something to that ACLU article.
Another odd thing about that video—most of the agents would not, refused, or tried to talk about the question “am I being detained?” Oh, and the automatic assumption of guilt when one refuser plead the Fifth. I also found it offensive when the officers admonished the refusers for making their job more difficult.
Wednesday, March 06, 2013
Peak government, I tell ya! Peak government!
Take Eddie Leroy Anderson, a retired logger from Idaho whose only crime was loaning his son “some tools to dig for arrowheads near a favorite campground of theirs,” according to the Wall Street Journal. Anderson and his son found no arrowheads, but because they were unknowingly on federal land at the time they were judged to be in violation of an obscure Carter- era law called the Archaeological Resources Protection Act.
The government showed no mercy. Wendy Olson, the Obama appointee prosecuting the case, saw to it that father and son were fined $1,500 apiece and each sentenced to a year's probation. “Folks do need to pay attention to where they are,” she said.
Statutory law in America has expanded to the point that government's primary activity is no longer to protect, preserve and defend our lives, liberty and property, but rather to stalk and entrap normal American citizens doing everyday things.
After identifying three federal offenses in the U.S. Constitution— treason, piracy and counterfeiting—the federal government left most matters of law enforcement to the states. By the time President Obama took office in 2009, however, there were more than 4,500 federal criminal statutes on the books.
Via Instapundit, Op-Ed: How to end overcriminalization | WashingtonExaminer.com
Remember, ignorantia juris non excusat, so better start reading.
Thursday, March 07, 2013
Plugging leaky memory
Just because a language has garbage collection doesn't mean you still can't leak memory—you can easily leak memory, since quite a few modern langauges that have garbage collection have ways of calling into libraries written in C, and those can leak.
With that said, reading “Tracking
down a memory leak in Ruby's EventMachine (link via Hacker News) was
quite informative. Looking for patterns in the leaked memory as a means of
tracking down what was being leaked was brilliant (“Well, as mentioned,
95+% of our program’s memory footprint is leaked objects. So if we just take
a random sample of bits of memory, we will find leaked objects with very
good probability.”). And I did not know you could call C functions from
within gdb
.
This is something I'll have to keep in mind for work.
Friday, March 08, 2013
Um … why couldn't the giant eagles fly the Ring into Mordor?
When I first saw How the Lord of the Rings Should Have Ended, I though, “Yeah, why didn't they use the giant eagles to fly the Ring to Mount Doom?”
One of the arguments I've heard is that the Nazgûls' fell beasts would have taken out the giant Eagles once they entered into Mordor. But Sean Chist felt otherwise—he thinks it's a plot hole left by Tolkien (link via Hacker News).
But personally, I find this theory much more plausible and a much better explanation for why “Operation: AirDrop One” wasn't done.
Saturday, March 09, 2013
Shave and a haircut, definitely not two bits
Bunny decided it was time for me to get a haircut. Normally, she does the cutting, but after the last haircut I received at a real barber shop (in Brevard—real barber pole, barbers, wood panelling, the works) she felt that the professionals did a much better job at it then she.
But we live in Boca Raton, not quite your Small Town, USA™ so we had to make do with something a bit more upscale, The Man Cave. At the appointed time, I walked into the Man Cave.
“Welcome, Sean,” said the hostess.
“Um … how did you–”
“You're here for your four o'clock appointment,” she said. “Would you care for wine? Or perhaps an imported beer from the Continent?”
“Oh. Um. No, I'm fine.”
“Very well. Chel will take care of you,” she said. “Chel! You're four o'clock is here.” She pointed over to the chairs, nestled among oversized high-contrast portraits of James Dean and Marlon Brando. “This way,” she said.
“Hello,” said Chel, walking over to lead me to her chair. “Please, take a seat. Short, over the ears, close cropped shave.”
“How did—”
“Shh, just sit back and relax,” she said, tying a paper collar about my neck and adjusting the snap-on tarp. “Glasses,” she said.
“Oh, yes,” I said. I took off my glasses, and she placed them gently on the nearby counter. She then started clipping my hair. It was the typical motions—snip here, snip there, reposition my head, more snipping, use the electric razer here and there and before long, she had apparently finished with cutting my hair.
She then lowered the back of the chair so I was nearly lying horizontally. “Please, relax,” Chel said, as she lowered a folded, steaming hot towel across the lower half of my face, then raised the folded part to cover my entire face. Oddly enough, even though I could see the steam rising off of it (even without my glasses) it wasn't scalding. In fact, it felt nice. It was wisked off, then she massaged my face, then another towel, then various gels and what not were rubbed into my face, then another hot towel, then more gels and finally, the shave with an honest-to-god straight razor. That was weird. I could feel it (felt like a sharp pencil against my skin) and hear it scrap the hair off my face.
And with that, I was done.
It was not cheap. But it was a fun experience. And certainly a different experience from a small town barber shop.
I have a sad story to tell you, it might hurt your feelings a bit
Bunny and I found ourselves at The Rock Steady Jamaican Bistro in Boca Raton for dinner. An early dinner. Or a late lunch. Or a very late brunch. Dunch, if you will.
So we were eating dunch at The Rock Steady Jamaican Bistro (the food was quite good, but you better like it jerked) and for a change, we were listening to reggae. Not because we wanted to listen to raggae, but beause we were eating at a Jamaican restaurant.
And it could have been worse—it could have been country reggae.
But we were listening to reggae, when it struck me, the reggae music—it was a song I haven't heard in a long time …
“I have a sad story to tell you.”
I was trying to place where I heard it …
“It might hurt your feelings a bit.”
It's been years … and the heavy reggae beat wasn't helping, mon.
“I stepped in the bathroom last night.”
What was it?
“And stepped in a pile of shhhhhhhhh—”
Shaving Cream! It was the Shaving Cream song! Only with a very heavy reggae beat and a thick Jamaican accent. But yes, it was the Shaving Cream song!
Incredible!
A Liar's Autobiography
Bunny and I watched “A Liar's Autobiography: The Untrue Story of Monty Python's Graham Chapman” and I must say—it's a very odd film, even by Monty Python standards. Animated, in fourteen different styles (some of it very beautifully done) with most of Monty Python doing voice work (and no, it's not Graham Chapman that isn't in it), it's a completely made up story of the life of Graham Chapman.
Well, mostly made up. It does cover his homosexuality and drinking problems. But everything else is made up. Well, except for him working for Monty Python. But short of the homosexuality, the drinking problems, and working with Monty Python, it's all made up. Except his name really is Graham Chapman.
Okay, let me start over again.
Except for his name, his homosexuality, drinking problems and working with Monty Python, the movie is completey and utterly false.
Except he did study to become a doctor.
Damn!
Well, it's a very weird film and you should watch it because it's kind of true, except when it isn't.
There.
Sunday, March 10, 2013
Visions of a future past
Ah, 70's space colony concept art (link via InstaPundit)—what's not to like?
Alternative keyboards
This association of the keyboard with the synthesizer eased its entry into the world of music, but it also placed limitations on how the instrument is played that its designers didn't intend. The limitations of the piano keyboard have been recognized since long before the synthesizer existed. The biggest problem that the keyboard has always had is that, due to the two-row layout with all of the naturals on the bottom row and all of the accidentals on the top row, the performer must usually change fingering in order to transpose a chord from one key to another. This frustrates what should be a simple operation; the guitar player playing a barred chord can transpose it simply by moving up and down the neck, but the keyboard player must keep shifting fingers around to insure that each finger hits on the correct row. The additional manual dexterity and muscle memory requirement makes learning the different keys on the piano a slow and frustrating process. From my own experience, it also introduces the temptation to use teaching shortcuts that cause the student problems later on: a common technique is to start the beginning student out learning the C-major scale, which is played all on the white keys. This introduces a sort of fear or puzzlement at the black keys—what are the for? When does one use them? And then when the teacher starts introducing other scales, the use of the black keys seems arbitrary and unsystematic, and the student gets a bit freaked out. By contrast, guitar pedagogy treats the accidentals as simply other notes in the chromatic scale, which they are, and the guitar student has relatively little trouble understanding how to play different scales and keys.
Via Hacker News (via Hacker News), Sequence 15: Alternative Keyboards
I'm actually puzzled with (musical) electronic keyboards. Sure, they have a layout like a traditional piano, and yes, the C-major scale is played with all white keys, so why can't if you decide to play, say, a F-major scale, why can't you just remap the frequency of the keys so you can still play it with all white keys? C-major the keys can go C-D-E-F-G-A-B while in F-major, they go F-G-A-B♭-C-D-E and for A♭-major they go A♭-B♭-C-D♭-E♭-F-G. The same fingering, regardless of scale.
Yes, I know you can't do this on a traditional piano, but I'm not talking about a traditional piano here. You can't change the layout of a typewriter, yet it's trivial to change the layout of a computer keyboard (used to type text)—it's a matter of changing the software and boom—you have a Colmak layout!
But short of that, I am facinated by alternative music keyboards, probably because I'm not a musician and to me, these alternative music keyboards seem to show the patterns inherent in music must better than a piano keyboard.
Monday, March 11, 2013
Paper
For all the things an iPad can do, there's still a place for paper (link via dad).
Tuesday, March 12, 2013
Flowing water, stationary
Brusspup filmed water going through a speaker running at 24Hz (link via Hacker News) and the result is incredible—the water just sits there, in mid-air. As he adjusts the sound wave, the water appears to move slowly down, then reverses itself.
It is a trick, though it's not special effects—it's all done in real time with a camera. But it's the camera that causes the effect—it's acting like a strobe light, catching the action at just the right moment. And it's still impressive though.
Wednesday, March 13, 2013
Why does “Enterprise Software” universally suck?
I work in the QA Department of The Corporation. The majority of the team is stationed in the Seattle Office whereas I am the only member of QA in The Ft. Lauderdale Office. The Seattle Office tests the actual cell phones, whereas not only do I test call processing (and even though I might complain about the Protocol Stack From Hell™, I can automate my tests—muahahahahahaha!) but I am the only person who tests call processing in The Corporation.
I bring this up because the QA Department is now using Gusty, a “Real-Time Test Management System” as it calls itself. And so far, I am seriously unimpressed with it. It's written in Flash and tries its darndest to imitate a Microsoft Windows interface, but it's a far cry from Microsoft Windows. And because it tries to imitate Microsoft Windows, it's quite Microsoft-centric.
But, because it tries so hard to be cross platform, it's almost, but not quite, cute. It screams that it was a Visual Basic application ported to Flash to sell to non-Microsoft shops. There's no way to change the font (which is borderline too small, even for me). It's hard to resize the windows. The scrolling is wonky. It's just an overall clunky user interface.
And that would be fine if it actually helped with my job. But it falls down there too. There are two main objects—requirements and testcases. A test case can have multiple requirements, and a requirement can apply to several testcases. You use different windows to create requirements and testcases.
Now, when you view the details of, say, a requirement, you can get a list of testcases it applies to, but it's a tool-tip like element—meaning, it pops up a small text window with a list of testcases. Can you click on one to go to the testcase? No. Can you select one? No. Can you copy the text with mouse? No. Does it remain up for any length of time? No. Is this list in anyway useful? No. Well … okay, you can memorize the ID before it goes away. So if you want more details on the a particular testcase for a requirement, you have to go to the testcase window and manually search for it.
Oh, there is a search facility (it was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying “Beware of the leopard”) but searching by ID doesn't work. You see, it's a text only search, and IDs being numbers, aren't text …
Yeah.
And the Microsoft-esqueness of the program means that this is really geared towards manual tests. Oh, they pay lipservice to automation and in theory, you can run automated tests, but in theory, there is no difference between theory and practice, but in practice …
In practice, you install some Java client on the machine to run the tests and somehow get this tool to run the tests. And okay, that's fine.
Only my test program runs eight programs (which spawn like a bazillion
processes) on four computers, and the programs need to start in a particular
order, with particular data. Somehow, I don't think the Gusty tool was
built with that type of testing in mind (and when I said the tests I run are
automated? Yes. They are. But the setup isn't, as there are a few steps
that have security implications involving root
).
Now, I'm sure that Gusty is a fine tool within certain limitations (large testing teams manually testing software using Microsoft Windows) but for what I do, it doesn't work at all.
Thankfully, I can continue with my job without having to use Gusty, as I'm practically my own department.
Thursday, March 14, 2013
Happy π Day, I guess …
It seems that several of my peeps on GoogleFacePlusBook are making some noise about it being π Day or something like that. Well … it's 3/14/13, or 3.1413, which is 0.0003 too small. I think it might be better to wait until March 14th, 2015. It's just too bad it isn't March 14th, 1592.
Friday, March 15, 2013
Rolling them bones
On GoogleFacePlusBook, Jeff linked to an article about non-transitive dice—three dice where, on average (meaning—many rolls) die A will win over die B, die B will win over die C, but die C will win over die A (kind of like rock-paper-scissors). Even weirder, if you double the dice, two A's against two B's against two C's, the order reverses! (And the accompanying video shows a series of five dice with an even weirder dual-non-transitive ordering).
This page remineded me of a set of “go-first-dice”—a set of twelve sided dice where, say, four people each have a ¼ chance of rolling the highest number, a ¼ chance of rolling the second highest number, a ¼ chance of rolling the third highest number and a ¼ chance of rolling the lowest number. In this case, the dice form a strict ordering (there is no chance of a tie).
Monday, March 18, 2013
You know, you might want to check the calendar for March 2002
Okay, I've seen this particular meme multiple times now on GoogleFacePlusBook:
THIS IS THE ONLY TIME WE WILL SEE AND LIVE THIS EVENT
Calendar for March 2013
March 2013 Sun Mon Tue Wed Thr Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 This year March has 5 Fridays, 5 Saturdays and 5 Sundays. This happens once every 823 years. This is called money bags. So, share this to your friends and money will arrive within 4 days. Based on Chinese “Feng Shui.” The one who does not share … will be without money.
The first time I saw it, I thought, Weird—is that true? Really? 823 years since the last time? Okay, let's write some code to answer this—which years has March 1st fallen on a Friday?
#!/usr/bin/env lua function dayofweek(date) local a = math.floor((14 - date.month) / 12) local y = date.year - a local m = date.month + 12 * a - 2 local d = date.day + y + math.floor(y / 4) - math.floor(y / 100) + math.floor(y / 400) + math.floor(31 * m / 12) return (d % 7) + 1 end for year = 1190,2014 do if dayofweek { year = year , month = 3 , day = 1 } == 6 then print(year) end end
1190 is 823 years ago. Let's see what we get …
1191 1196 1202 1213 1219 1224 1230 1241 1247 1252 1258 1269 1275 1280 1286 1297 1309 1315 1320 1326 1337 1343 1348 1354 1365 1371 1376 1382 1393 1399 1405 1411 1416 1422 1433 1439 1444 1450 1461 1467 1472 1478 1489 1495 1501 1507 1512 1518 1529 1535 1540 1546 1557 1563 1568 1574 1585 1591 1596 1602 1613 1619 1624 1630 1641 1647 1652 1658 1669 1675 1680 1686 1697 1709 1715 1720 1726 1737 1743 1748 1754 1765 1771 1776 1782 1793 1799 1805 1811 1816 1822 1833 1839 1844 1850 1861 1867 1872 1878 1889 1895 1901 1907 1912 1918 1929 1935 1940 1946 1957 1963 1968 1974 1985 1991 1996 2002 2013
(okay, I cleaned up the output a bit)
Well … it's happened 118 times since 1190, and it didn't happen 823 years ago, but 822 years ago, and 817 years ago and … you get the picture.
Hardly a rare occurrence and in thinking about it, any month with 31 days will have three days that happen five times a month (five Sundays, five Mondays and five Tuesdays for instance).
Now, I can hope this puts this particularly medium meme (it's not rare, and it's not well done) out of commission. But alas, I doubt it …
Tuesday, March 19, 2013
It was an inside job
“On that fateful day, the day that the Empire was attacked, all the crew, staff and guests including security personel and Imperial officials with the exception of one were killed when the terrorist organization known only as the ‘Rebel Alliance’ blew up the Death Star with a squad of single man fighters.”
That much is known. But really, how does a farm boy with no combat experience blow up the most fortified and heavily armed vehicle in the Empire's arsenal? You do really believe the “official story?”
Nah, it was an inside job (link via GoogleFacePlusBook).
Wednesday, March 20, 2013
Simple
You just went to the Google home page.
Simple, isn't it?
What just actually happened?
Well, when you know a bit of about how browsers work, it's not quite that simple. You've just put into play HTTP, HTML, CSS, ECMAscript, and more. Those are actually such incredibly complex technologies that they'll make any engineer dizzy if they think about them too much, and such that no single company can deal with that entire complexity.
Let's simplify.
Via Hacker News, Jean- Baptiste Queru - Google+ - Dizzying but invisible depth
It's hard to simplify how modern computers work. Sure, I could say something along the lines of “a computer consists of three components, memory, which stores a program and data, the CPU, which executes the program out of memory and input/output devices, like a keyboard, the monitor.” But while I'm not outright lying, I am grossly simplifying and skipping a lot of details (for instance, memory could be considered an input/output device, but it's treated differently than other input/output devices, except when it isn't).
For instance:
Let's say you've just bought a MacBook Air, and your goal is to become master of the machine, to understand how it works on every level.
…
The total of all of this is 79 pages shy of eleven thousand. I neglected to include man pages for hundreds of system utilities and the Xcode documentation. And I didn't even touch upon the graphics knowhow needed to do anything interesting with OpenGL, or how to write good C and Objective-C or anything about object-oriented design, and …
A Complete Understanding is No Longer Possible
And those 11,000 pages exclude documentation on the hardware. For comparison, the Color Computer (my first computer). The TRS-80 Color Computer Technical Reference Manual, which covers the hardware, is 69 pages; the technical reference for the MC6809 (the CPU) is 35 pages; the reference for the MC6821 (an I/O adaptor) is 11 pages; the reference for the MC6847 (the video graphics chip) is 26 pages, and the EDTASM+ manual (the assembler) is 68 pages.
So, in 209 pages, you will know enough to program the Color Computer (assuming you know how to program in assembly to begin with—else tack on another 294 pages for TRS-80 Color Computer Assembly Language Programming for a little over 500 pages).
Even “Project: Wolowizard” is rather insane, requiring knowledge of SS7, IP, HTTP, Solaris, Linux, C, C++, Javascript, Java, Lua, Ruby, Python, SQL, HTML, CSS, XML and that's just the “off-the- shelf” stuff we're using (and all the documentation for that probably exceeds 11,000 pages quite easily; there are probably over 2,000 pages just for SS7 and IP alone)—I'm not mentioning the file formats and protocols we've developed for “Project: Wolowizard” (just the test plan for one component is over 150 pages, and there's at least seven components just on the backend).
It's simple.
Thursday, March 21, 2013
A good idea in theory marred by the terrible reality of practice
I get the feeling sometimes that not enough is written about failed ideas—not bad ideas, the ones that shouldn't be done but the class of ideas that can't be done for one reason or another.
Today I had one such idea, but first, some back story.
Sometime last year, R, who runs the Ft. Lauderdale Office of The Corporation, was listening to me lament about The Protocol Stack From Hell™ and how I had this magical ability to break it by thinking bad thoughts about it (an amazing feat when you consider that the physical computers are several miles away in a data center and that any bad thoughts I had towards it had to travel over a remote command line interface).
R explained that SS7 networks are different than IP networks in that any SS7 endpoint that bounces up and down will effectively be ignored by the rest of the SS7 network (and will typically require manual intervention to re-establish a connection), so was there any way I could keep my testing program up and running.
I countered that I didn't think so, seeing how I had to test the testing program and as such, I had to stop and start the program as I found bugs in my own code while I was using it to find bugs in the code I was paid to find bugs in. R conceeded the point and that was that. I would keep doing what I was doing and if the SS7 stack on the machines needed to be restarted because I borked The Protocol Stack From Hell™ yet again, so be it.
Then today, I read about the reliability of the Tandem computer (link via programming is terrible).
Hi, is this Support? We have a problem with our Tandem: A car bomb exploded outside the bank, and the machine has fallen over … No, no it hasn't crashed, it's still running, just on its side. We were wondering if we can move it without breaking it.
Apocraphal story about a Tandem computer
[One other apocraphal story about the Tandem. About fifteen years ago I worked at a company that had a Tandem computer. It was said that one day a cooling fan for the Tandem computer just showed up at the receptionist's desk with no explaination. When she called Tandem about the apparent mistaken delivery, they said that the Tandem computer had noticed its cooling fan was marginal and had ordered a replacement fan.]
I had an idea.
I can't say exactly what triggered the idea—it just hit me.
The idea was to write a very small, and very simple program that established an SS7 endpoint—a “master control program” if you will. It would also listen in on a named pipe for commands. One command would start the testing program, passing the SS7 endpoint to the testing program to use (another command would be to stop the testing program). The SS7 endpoint that is created is a Unix file descriptor (a file descriptor is an integer value used to refer to an open file under Unix, but more importantly, we have the source code to The Protocol Stack From Hell™ and the fact that the SS7 endpoint is a file descriptor is something I can verify). Open file descriptors are inherited by child processes. Closing a file descriptor in a child process does not close it in the parent process, so the test program can crash and burn, but because the SS7 endpoint is still open in the “master control program” it's still “up” to the rest of the SS7 network.
It's a nice idea.
It won't work.
That's because the user library we use to establish an SS7 endpoint keeps static data based on the file descriptor (and no, it doesn't use the integer value as an index into an array, which would be quick—oh no, it does a linear search, multiple times for said private data—I really need a triple facepalm picture for this) and there's no way to establish this static data given an existing file descriptor.
Sigh.
Friday, March 22, 2013
Preloading Lua modules
I'm tasked with testing the call processing on “Project: Wolowizard.” M suggested, and I concurred, that using Lua to manage the testing scripts would be a Good Thing™. Easier to write and modify the tests as needed. So over the past few years I've written a number of modules to handle the files and protocols used in the project (one side effect: by re-implemeting the code to read/write the various data files helped to verify the specification and flush out architectural dependencies in the binary formats).
But one problem did exist: Not all the systems I need to run the test on have Lua installed, and LuaRocks has … um … “issues” on our Solaris boxes (otherwise, it's not that bad a package manager). So I decided to build what I call “Kitchen Sink Lua”—a Lua interpreter that has the 47 modules required to run the testing scripts (okay, eight of the modules are already built into Lua).
It took some time to wrangle, as some of the modules were written in Lua (so the source needed to be embedded) and I had to figure out how to integrate some third party modules (like LuaCURL) into the build system, but perhaps the hardest bit was to ensure the modules were initialized properly. My first attempt, while it worked (mostly by accident) wasn't technically correct (as I realized when I read this message on a mailing list).
I then restructured my code, which not only made it correct, but smaller and clearer.
#include <stdlib.h> #include <assert.h> #include <lua.h> #include <lauxlib.h> #include <lualib.h> /**************************************************************************/ typedef struct prelua_reg { const char *const name; const char *const code; const size_t *const size; } prelua_reg__t; /*************************************************************************/ int luaopen_org_conman_env (lua_State *); int luaopen_org_conman_errno (lua_State *); int luaopen_org_conman_fsys (lua_State *); int luaopen_org_conman_math (lua_State *); int luaopen_org_conman_syslog (lua_State *); int luaopen_org_conman_hash (lua_State *); int luaopen_org_conman_string_trim (lua_State *); int luaopen_org_conman_string_wrap (lua_State *); int luaopen_org_conman_string_remchar (lua_State *); int luaopen_org_conman_process (lua_State *); int luaopen_org_conman_net (lua_State *); int luaopen_org_conman_dns (lua_State *); int luaopen_org_conman_sys (lua_State *); int luaopen_org_conman_uuid (lua_State *); int luaopen_lpeg (lua_State *); int luaopen_LuaXML_lib (lua_State *); int luaopen_cURL (lua_State *); /***********************************************************************/ /*--------------------------------------------------------------- ; Modules written in Lua. The build system takes the Lua code, ; processes it through luac (the Lua compiler), then creates an ; object file which exports a character array containing the byte ; code, and a variable which gives the size of the bytecode array. ;---------------------------------------------------------------*/ extern const char c_org_conman_debug[]; extern const size_t c_org_conman_debug_size; extern const char c_org_conman_getopt[]; extern const size_t c_org_conman_getopt_size; extern const char c_org_conman_string[]; extern const size_t c_org_conman_string_size; extern const char c_org_conman_table[]; extern const size_t c_org_conman_table_size; extern const char c_org_conman_unix[]; extern const size_t c_org_conman_unix_size; extern const char c_re[]; extern const size_t c_re_size; extern const char c_LuaXml[]; extern const size_t c_LuaXml_size; /*---------------------------------------------------------------- ; Modules written in C. We can use luaL_register() to load these ; into package.preloaded[] ;----------------------------------------------------------------*/ const luaL_Reg c_preload[] = { { "org.conman.env" , luaopen_org_conman_env } , { "org.conman.errno" , luaopen_org_conman_errno } , { "org.conman.fsys" , luaopen_org_conman_fsys } , { "org.conman.math" , luaopen_org_conman_math } , { "org.conman.syslog" , luaopen_org_conman_syslog } , { "org.conman.hash" , luaopen_org_conman_hash } , { "org.conman.string.trim" , luaopen_org_conman_string_trim } , { "org.conman.string.wrap" , luaopen_org_conman_string_wrap } , { "org.conman.string.remchar" , luaopen_org_conman_string_remchar } , { "org.conman.process" , luaopen_org_conman_process } , { "org.conman.net" , luaopen_org_conman_net } , { "org.conman.dns" , luaopen_org_conman_dns } , { "org.conman.sys" , luaopen_org_conman_sys } , { "org.conman.uuid" , luaopen_org_conman_uuid } , { "lpeg" , luaopen_lpeg } , { "LuaXML_lib" , luaopen_LuaXML_lib } , { "cURL" , luaopen_cURL } , { NULL , NULL } }; /*--------------------------------------------------------------- ; Modules written in Lua. These need to be loaded and populated ; into package.preloaded[] by some code provided in this file. ;---------------------------------------------------------------- const prelua_reg__t c_luapreload[] = { { "org.conman.debug" , c_org_conman_debug , &c_org_conman_debug_size } , { "org.conman.getopt" , c_org_conman_getopt , &c_org_conman_getopt_size } , { "org.conman.string" , c_org_conman_string , &c_org_conman_string_size } , { "org.conman.table" , c_org_conman_table , &c_org_conman_table_size } , { "org.conman.unix" , c_org_conman_unix , &c_org_conman_unix_size } , { "re" , c_re , &c_re_size } , { "LuaXml" , c_LuaXml , &c_LuaXml_size } , { NULL , NULL , NULL } }; /*************************************************************************/ void preload_lua(lua_State *const L) { assert(L != NULL); lua_gc(L,LUA_GCSTOP,0); luaL_openlibs(L); lua_gc(L,LUA_GCRESTART,0); /*--------------------------------------------------------------- ; preload all the modules. This does does not initialize them, ; just makes them available for require(). ; ; I'm doing it this way because of a recent email on the LuaJIT ; email list: ; ; http://www.freelists.org/post/luajit/Trivial-bug-in-bitops-bitc-luaopen-bit,4 ; ; Pre-loading these modules in package.preload[] means that they're be ; initialized properly through the require() statement. ;---------------------------------------------------------------------*/ lua_getglobal(L,"package"); lua_getfield(L,-1,"preload"); luaL_register(L,NULL,c_preload); for (size_t i = 0 ; c_luapreload[i].name != NULL ; i++) { int rc = luaL_loadbuffer(L,c_luapreload[i].code,*c_luapreload[i].size,c_luapreload[i].name); if (rc != 0) { const char *err; switch(rc) { case LUA_ERRRUN: err = "runtime error"; break; case LUA_ERRSYNTAX: err = "syntax error"; break; case LUA_ERRMEM: err = "memory error"; break; case LUA_ERRERR: err = "generic error"; break; case LUA_ERRFILE: err = "file error"; break; default: err = "unknown error"; break; } fprintf(stderr,"%s: %s\n",c_luapreload[i].name,err); exit(EXIT_FAILURE); } lua_setfield(L,-2,c_luapreload[i].name); } } /*************************************************************************/
Yes, this is the code used in “Project: Wolowizard” (minus the
proprietary modules) and is a good example of the module preload feature in
Lua. The modules in C are easy to build (the following is from the
Makefile
):
obj/spc/process.o : $(LUASPC)/src/process.c \ $(LUA)/lua.h \ $(LUA)/lauxlib.h $(CC) $(CFLAGS) -I$(LUA) -c -o $@ $<
While the Lua-based modules are a bit more involved:
obj/spc/unix.o : $(LUASPC)/lua/unix.lua $(BIN2C) $(LUAC) $(LUAC) -o tmp/unix.out $< $(BIN2C) -o tmp/unix.c -t org_conman_unix tmp/unix.out $(CC) $(CFLAGS) -c -o $@ tmp/unix.c
These modules are compiled using luac
(which outputs the Lua
byte code used by the core Lua VM), then through a program that converts this output
into a C file, which is then compiled into an object file that can be linked
into the final Kitchen Sink Lua interpreter.
Musings on the Current Work Project Du jour
So I have this Lua code that implements the cellphone end of a protocol used in “Project: Wolowizard.” I need to ramp up the load testing on this portion of the project so I'm looking at what I have and trying to figure out how to approach this project.
The protocol itself is rather simple—only a few messages are defined and the code is rather straightforward. It looks something like:
-- Pre-define these state_receive = function(phone,socket) end state_msg1 = function(phone,socket,remote,msg) end state_msg2 = function(phone,socket,remote,msg) end -- Now the code state_receive = function(phone,socket) local remote,packet,err = socket:read() if err ~= 0 then syslog('err',string.format("error reading socket: %s",errno[err])) return state_receive(phone,socket) end local msg,err = sooperseekritprotocol.decode(packet) if err ~= 0 then syslog('err',string.format("error decoding: %s",decoderror(err)) return state_receive(phone,socket) end if msg.type == 'MSG1" then return state_msg1(phone,socket,remote,msg) elseif msg.type == "MSG2" then return state_msg2(phone,socket,remote,msg) else syslog('warn',string.format("unknown message: %s",msg.type)) return state_receive(phone,socket) end end state_msg1 = function(phone,socket,remote,msg) local reply = ... -- code to handle this msg local packet = sooperseekritprotocol.encode(reply) socket:write(remote,packet) return state_receive(phone,socket) end state_msg2 = function(phone,socket,remote,msg) local reply = ... -- code to andle this msg local packet = sooperseekritprotocol.encode(reply) socket:write(remote,packet) return state_receive(phone,socket) end
Don't worry about this code blowing out the call stack—Lua optimizes tail calls
and these effectively become GOTO
s. I found this feature to be
very useful in writing protocol handlers since (in my opinion) it makes the
state machine
rather explicit.
Now, to speed this up, I could translate this to C. As I wrote the Lua modules for The Kitchen Sink Lua interpreter, I pretty much followed a bi-level approach. I have a C interface (to be used by C code) which is then mimicked in Lua. This makes translating the Lua code into C more or less straightforward (with a bit more typing because of variable declarations and what not).
But here, I can't rely on the C compiler to optimize tail calls
(GCC
can, but only with certain options; I don't know
about the Solaris C compiler). I could have the routines return the next
function to call and use a loop:
while((statef = (*statef)(phone,sock,&remote,&msg) != NULL) /* the whole state machine is run in the previous line;
But just try to define the type of statef
so the
compiler doesn't complain about a type mismatch. It needs to define a
function that takes blah and returns a function that takes
blah and returns a function that takes blah and returns a
function that … It's one of those recurisive type definitions that
produce headaches when you think too much about it.
Okay, so instead, let's just have a function that returns a simple integer value that represents the next state. That's easier to define and the main driving loop isn't that bad:
while(state != DONE) { switch(state) { case RECEIVE: state = state_receive(phone,socket,&remote,&msg); break; case MSG1: state = state_msg1(phone,socket,&remote,&msg); break; case MSG2: state = state_msg2(phone,socket,&remote,&msg); break; default: assert(0); break; } }
Okay, with that out of the way, we can start writing the C code.
Clackity-clackity-clack clackity-clack clack clack clackity-clackity-clackity-clack clack clack clack clack …
Man, that's boring drudgework. Okay, let's just use the Lua code and
maybe throw some additional threads at this. I don't think that's a bad
approach. Now, Lua, out of the box, isn't exactly thread-safe. Sure, you
can provide an implemention of lua_lock()
and
lua_unlock()
but that might slow Lua down quite a bit (there
are 62 locations where the lock could be taken in the Lua engine). We could
give each thread its own Lua state—how bad could that be?
How big is a Lua state? Let's find out, shall we?
#include <stdio.h> #include <stdlib.h> #include <lua.h> #include <lauxlib.h> int main(void) { lua_State *L; L = luaL_newstate(); if (L == NULL) { perror("luaL_newstate()"); return EXIT_FAILURE; } printf("%d\n",lua_gc(L,LUA_GCCOUNT,0) * 1024); lua_close(L); return EXIT_SUCCESS; }
When compiled and run, this returns 2048
, the amount of
memory used in an empty Lua state. That's not bad at all, but that's an
empty state. What about a more useful state, like the one you get
when you run the stock Lua interpreter?
-- ensure any accumulated garbage is reclaimed collectgarbage('collect') collectgarbage('collect') collectgarbage('collect') print(collectgarbage('count') * 1024)
Okay, when I run this, I get 17608
. Eh … it's not
that bad per thread (and I do have to remind myself—this is
not running on my Color Computer
with 16,384 bytes of memory). But I'm not running the stock Lua
interpreter, I'm running the Kitchen Sink Lua with all the
trimmings—how big is that state?
I run the above Lua code and I get 4683963
.
Four and a half megs!
Ouch.
I suppose if it becomes an issue, I could always go back to writing C …
Saturday, March 23, 2013
Preloading Lua modules, part II
Well, four and a half megs per Lua state in the Kitchen Sink Lua interpreter. I thought about it, and I had Yet Another Idea™.
Lua not only has an array for preloaded modules, but an array of functions used to locate and load modules. So the idea I had was to insert two custom load functions—one to search for C based Lua modules, and one for Lua-based Lua modules. The code is pretty much straight forward:
#include <stdlib.h> #include <string.h> #include <assert.h> #include <lua.h> #include <lauxlib.h> #include <lualib.h> /**************************************************************************/ typedef struct prelua_reg { const char *const name; const char *const code; const size_t *const size; } prelua_reg__t; /*************************************************************************/ int luaopen_org_conman_env (lua_State *); int luaopen_org_conman_errno (lua_State *); int luaopen_org_conman_fsys (lua_State *); int luaopen_org_conman_math (lua_State *); int luaopen_org_conman_syslog (lua_State *); int luaopen_org_conman_hash (lua_State *); int luaopen_org_conman_string_trim (lua_State *); int luaopen_org_conman_string_wrap (lua_State *); int luaopen_org_conman_string_remchar (lua_State *); int luaopen_org_conman_process (lua_State *); int luaopen_org_conman_net (lua_State *); int luaopen_org_conman_dns (lua_State *); int luaopen_org_conman_sys (lua_State *); int luaopen_org_conman_uuid (lua_State *); int luaopen_lpeg (lua_State *); int luaopen_LuaXML_lib (lua_State *); int luaopen_cURL (lua_State *); /***********************************************************************/ extern const char c_org_conman_debug[]; extern const size_t c_org_conman_debug_size; extern const char c_org_conman_getopt[]; extern const size_t c_org_conman_getopt_size; extern const char c_org_conman_string[]; extern const size_t c_org_conman_string_size; extern const char c_org_conman_table[]; extern const size_t c_org_conman_table_size; extern const char c_org_conman_unix[]; extern const size_t c_org_conman_unix_size; extern const char c_re[]; extern const size_t c_re_size; extern const char c_LuaXml[]; extern const size_t c_LuaXml_size; const luaL_Reg c_preload[] = { { "LuaXML_lib" , luaopen_LuaXML_lib } , { "cURL" , luaopen_cURL } , { "lpeg" , luaopen_lpeg } , { "org.conman.dns" , luaopen_org_conman_dns } , { "org.conman.env" , luaopen_org_conman_env } , { "org.conman.errno" , luaopen_org_conman_errno } , { "org.conman.fsys" , luaopen_org_conman_fsys } , { "org.conman.hash" , luaopen_org_conman_hash } , { "org.conman.math" , luaopen_org_conman_math } , { "org.conman.net" , luaopen_org_conman_net } , { "org.conman.process" , luaopen_org_conman_process } , { "org.conman.string.remchar" , luaopen_org_conman_string_remchar } , { "org.conman.string.trim" , luaopen_org_conman_string_trim } , { "org.conman.string.wrap" , luaopen_org_conman_string_wrap } , { "org.conman.sys" , luaopen_org_conman_sys } , { "org.conman.syslog" , luaopen_org_conman_syslog } , { "org.conman.uuid" , luaopen_org_conman_uuid } , }; #define MAX_CMOD (sizeof(c_preload) / sizeof(luaL_Reg)) const prelua_reg__t c_luapreload[] = { { "LuaXml" , c_LuaXml , &c_LuaXml_size } , { "org.conman.debug" , c_org_conman_debug , &c_org_conman_debug_size } , { "org.conman.getopt" , c_org_conman_getopt , &c_org_conman_getopt_size } , { "org.conman.string" , c_org_conman_string , &c_org_conman_string_size } , { "org.conman.table" , c_org_conman_table , &c_org_conman_table_size } , { "org.conman.unix" , c_org_conman_unix , &c_org_conman_unix_size } , { "re" , c_re , &c_re_size } , }; #define MAX_LMOD (sizeof(c_luapreload) / sizeof(prelua_reg__t)) /*************************************************************************/ static int luaLReg_cmp(const void *needle,const void *haystack) { const char *key = needle; const luaL_Reg *value = haystack; return (strcmp(key,value->name)); } /*************************************************************************/ static int preloadlua_cloader(lua_State *const L) { const char *key; const luaL_Reg *target; key = luaL_checkstring(L,1); target = bsearch(key,c_preload,MAX_CMOD,sizeof(luaL_Reg),luaLReg_cmp); if (target == NULL) lua_pushnil(L); else lua_pushcfunction(L,target->func); return 1; } /************************************************************************/ static int preluareg_cmp(const void *needle,const void *haystack) { const char *key = needle; const prelua_reg__t *value = haystack; return (strcmp(key,value->name)); } /*************************************************************************/ static int preloadlua_lualoader(lua_State *const L) { const char *key; const prelua_reg__t *target; key = luaL_checkstring(L,1); target = bsearch(key,c_luapreload,MAX_LMOD,sizeof(prelua_reg__t),preluareg_cmp); if (target == NULL) lua_pushnil(L); else { int rc = luaL_loadbuffer(L,target->code,*target->size,target->name); if (rc != 0) lua_pushnil(L); } return 1; } /***********************************************************************/ void preload_lua(lua_State *const L) { assert(L != NULL); lua_gc(L,LUA_GCSTOP,0); luaL_openlibs(L); lua_gc(L,LUA_GCRESTART,0); /*--------------------------------------------------------------- ; modify the package.loaders[] array to include two new searchers: ; ; 1) scan for a C based module, return luaopen_*() ; 2) scan for a Lua based module, return the result of luaL_loadbuffer() ;---------------------------------------------------------------------*/ lua_getglobal(L,"package"); lua_getfield(L,-1,"loaders"); int len = lua_objlen(L,-1); /*----------------------------------------------------------------------- ; insert the two new functions at the start of the package.loaders[] ; array---this way, we get first crack at loading the modules and don't ; waste time with expensive disk lookups. ;----------------------------------------------------------------------*/ for (int i = len + 2 ; i > 3 ; i--) { lua_rawgeti(L,-1,i - 2); lua_rawseti(L,-2,i); } lua_pushinteger(L,1); lua_pushcfunction(L,preloadlua_cloader); lua_settable(L,-3); lua_pushinteger(L,2); lua_pushcfunction(L,preloadlua_lualoader); lua_settable(L,-3); }
And a quick test of the new Kitchen Sink Lua interpeter on this:
-- ensure any accumulated garbage is reclaimed collectgarbage('collect') collectgarbage('collect') collectgarbage('collect') print(collectgarbage('count') * 1024)
reveals a nice usage of 17,618 bytes—on par with the stock Lua interpreter. What's happening here is that the modules are no longer being shoved into the Lua state regardless of use (and there's one module that accounts for about 3½ megabytes—it's rarely used, but I do need it in some circumstances); they're now loaded into the Lua state as needed.
This also lead me to the concept of compressing the Lua written modules with zlib to save space in the executable (and it does help). I'll leave that code to the reader as an exercise.
Interestingly enough, the only hard part of this was trying to figure out
how to insert two elements at the start of an array using the C API of Lua—there is no
equivalent function to the Lua table.insert()
function. I
resorted to checking the source code to table.insert()
to see
how it was done.
The only other problem I had was debugging the zlib
-based
version of this code—a typo (two missing characters—sigh) lead me on a
multi-hour bug chase.
But it works now, and I've decreased memory usage quite a bit with some few simple changes to the code, which is always nice.
Update on Wednesday, March 22nd, 2023
Monday, March 25, 2013
On tail call optimization in certain C compilers
- From
- Mark Grosberg <XXXXXXXXXXXXXXXXX>
- To
- Sean Conner <sean@conman.org>
- Subject
- Tail calls.
- Date
- Sun, 24 Mar 2013 12:35:18 PM -0500
But here, I can't rely on the C compiler to optimize tail calls (GCC can, but only with certain options; I don't know about the Solaris C compiler). I could have the routines return the next function to call and use a loop:
I haven't verified it but it probably does. It's a pretty easy optimization (compared to what compilers do today) so I'd be surprised if the Sun C compiler doesn't handle this. At least for C code (C++ exceptions can throw some wrinkles in this in some cases).
-MYG
I decided to check. And yes, the
Solaris C compiler does support tail call optimizations. So I figured I
would play around with this a bit, both under gcc
and the
Solaris C compiler.
Final results: gcc
and the Solaris C compiler both support
tail call optimizations (some restrictions apply; void where prohibited;
your mileage may vary; results presented are not typical and yours might
vary; use only as directed; this program is distributed in the hope that it
will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty
of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE; do not taunt Happy
Fun Ball).
First off, the number of parameters must match among the functions; the types don't appear to matter that much, just the number. Second, the number (or size) of locally defined variables also matters. I'm not sure what the upper size (or number) for variables is (and it may differ between the two compilers) but it does appear to be a factor. Third, the only safe way to determine if tail call optimizations are being performed is to check the assembly code and check for calls (or, just run it and see if it crashes after a period of time).
So I can, kind of, assume tail call optimization in C code. It'll be something to keep in mind.
Tuesday, March 26, 2013
I wonder what IPO I'll be invited to this time?
I need to check what I used as a certain field in my Lua unix module so I thought I would do this through the Lua interpreter:
[spc]saltmine:~>lua Lua 5.1.4 Copyright (C) 1994-2008 Lua.org, PUC-Rio > unix = require "org.conman.unix" lua: src/env.c:54: luaopen_org_conman_env: Assertion `value != ((void *)0)' failed. Aborted (core dumped) [spc]saltmine:~>
What the … um … what's going on with that code?
int luaopen_org_conman_env(lua_State *L) { luaL_register(L,"org.conman.env",env); for (int i = 0 ; environ[i] != NULL ; i++) { char *value; char *eos; value = memchr(environ[i],'=',(size_t)-1); assert(value != NULL); eos = memchr(value + 1,'\0',(size_t)-1); assert(eos != NULL); lua_pushlstring(L,environ[i],(size_t)(value - environ[i])); lua_pushlstring(L,value + 1,(size_t)(eos - (value + 1))); lua_settable(L,-3); } return 1; }
No! It can't be! Really?
value = memchr(environ[i],'=',10000);
[spc]saltmine:~>lua Lua 5.1.4 Copyright (C) 1994-2008 Lua.org, PUC-Rio > unix = require "org.conman.unix" >
Yup. It can be! Really!
XXXX! I encountered this very same bug fifteen years ago! The GNU C library, 64 bit version.
Back then, the maintainers of the GNU were making an assumption that any value above some
already ridiculously large value was obviously bad and returning
NULL
, not even bothering to run memchr()
. But I
was using a valid value.
You see, I have a block of data I know an equal sign exists in.
If it doesn't exist, I have bigger things to worry about (like I'm not in
Kansas a POSIX environment anymore). But I don't know
how much data to look through. And instead of just assuming a
“large enough value” (which may be good enough for today, but then again,
640K was enough back in the day) I
decided to use a value, that converted to a size_t
type,
basically translates to “all of memory”.
And on a 32-bit system, it worked fine. But on the GNU C library, 64-bit version, it failed, probably because the maintainers felt that 18,446,744,073,709,551,615 bytes is just a tad silly to search through.
And the only reason I remember this particular bug, is
because it apparently was enough to get me invited to the RedHat
IPO (it was either
that, or my work on porting pfe
to IRIX back in the mid-90s).
I did a bit more research (basically—I tried two 64-bit Linux
distributions) and I found a really odd thing—glibc
version
2.3 does not exhibit the behavior (meaning, my code works on a
version released in 2007) but crashes under 2.12 (the code changed sometime
between 2007 and 2010).
Sigh. Time to investigate if this is still a problem in 2.17 and if so, report it as a bug …
Thursday, March 28, 2013
The end result was a computer producing vast amounts of nothing very slowly
So, I run this loadtest program on my work computer. It's going, I can
see the components I'm testing registering events (via the realtime
viewer I wrote for syslogintr
).
Everything is going fine … and … … then … … … t … h … e …
… c … o … m … p … … u … … … t … … … … e … …
… … … r … … … … … … … … s … … … … … … …
… … … l … … … … … … … … … … … … … … … o
… … … … … … … … … … … … w … … … … … …
… … … … … … … … … … … … … … … … … … …
… … … … … s …
It takes about ten minutes to type and run, but this:
[spc]saltmine:~>uptime 14:44:20 up 6 days, 23:12, 10 users, load average: 2320.45, 1277.98, 546.61
was quite amusing to see (usually the load average is 0). Perhaps it was just a tad ambitious to simulate 10,000 units on the work computer (each unit its own thread, running a Lua script—yes, even after the modifications to the Lua interpreter).
Also amusing was this:
[spc]saltmine:~>free total used free shared buffers cached Mem: 3910888 640868 3270020 0 35568 185848 -/+ buffers/cache: 419452 3491436 Swap: 11457532 544260 10913272
Yes, eleven gigabytes of memory were shoved out to the disk, so most of the slowless was due to thrashing.
Perhaps I should find some fellow cow-orker's computer to run this on …
Friday, March 29, 2013
A meta configure file
I'm tired of changing the configuration files as I test under different systems. It also seemed silly that I needed to replicate the configuration files for each system (or set of systems). What I wanted was a configuration for the configuration and that notion set off an alarm bell in my head.
The poster child for the “a configuration file for the configuration
file” is sendmail
, the only program I know
of that has a thousand page tome
dedicated to describing the configuration file, and it's little wonder
when the
syntax makes Perl look sane:
# try UUCP traffic as a local address R$* < @ $+ . UUCP > $* $: $1 < @ $[ $2 $] . UUCP . > $3 R$* < @ $+ . . UUCP . > $* $@ $1 < @ $2 . > $3 # hostnames ending in class P are always canonical R$* < @ $* $=P > $* $: $1 < @ $2 $3 . > $4 R$* < @ $* $~P > $* $: $&{daemon_flags} $| $1 < @ $2 $3 > $4 R$* CC $* $| $* < @ $+.$+ > $* $: $3 < @ $4.$5 . > $6 R$* CC $* $| $* $: $3 # pass to name server to make hostname canonical R$* $| $* < @ $* > $* $: $2 < @ $[ $3 $] > $4 R$* $| $* $: $2 # local host aliases and pseudo-domains are always canonical R$* < @ $=w > $* $: $1 < @ $2 . > $3 R$* < @ $=M > $* $: $1 < @ $2 . > $3 R$* < @ $={VirtHost} > $* $: $1 < @ $2 . > $3 R$* < @ $* . . > $* $1 < @ $2 . > $3
It's so bad that there does indeed exist a configuration file
for sendmail.cf
that's not ugly in a “line noise” way, but
ugly in a “needlessly verbose” way:
include(`/usr/share/sendmail-cf/m4/cf.m4')dnl VERSIONID(`setup for Red Hat Linux')dnl OSTYPE(`linux')dnl dnl # dnl # default logging level is 9, you might want to set it higher to dnl # debug the configuration dnl # dnl define(`confLOG_LEVEL', `9')dnl dnl # dnl # Uncomment and edit the following line if your outgoing mail needs to dnl # be sent out through an external mail server: dnl # dnl define(`SMART_HOST',`smtp.your.provider') dnl # define(`confDEF_USER_ID',``8:12'')dnl dnl define(`confAUTO_REBUILD')dnl define(`confTO_CONNECT', `1m')dnl define(`confTRY_NULL_MX_LIST',true)dnl define(`confDONT_PROBE_INTERFACES',true)dnl define(`PROCMAIL_MAILER_PATH',`/usr/bin/procmail')dnl define(`ALIAS_FILE', `/etc/aliases')dnl define(`STATUS_FILE', `/var/log/mail/statistics')dnl define(`UUCP_MAILER_MAX', `2000000')dnl define(`confUSERDB_SPEC', `/etc/mail/userdb.db')dnl define(`confPRIVACY_FLAGS', `authwarnings,novrfy,noexpn,restrictqrun')dnl define(`confAUTH_OPTIONS', `A')dnl
My thought was (and still is) if a configuration file needs a configuration file, you're doing it wrong. So yes, I was experiencing some cognitive dissonance with writing a configuriation file for a configuration file.
But on second thought, I'm not configuring a single configuration file, I'm configuring multiple configuration files. And no, it's not one configuration file (with changes for different systems) but several configuration files, all of which need to be changed for a different system. And not only changed, but that the changes are consistent with each other—that component P is configured with the IP address of component W, and that W has the IP address of component P. And in that view, I feel better with having a configuration file for the configuration files.
Another factor to keep in mind is that I'm reading in the sample configuration file (they're in XML so parsers are readily available) from the source repository and then making changes (directly to the in-memory DOM) and saving the results to a new file. That way, I'm sure to get the latest and greatest version of the configuration file (they do change, but it's rare and it can be easy to miss, like what happened in production about two weeks ago)—most of the contents are sensible defaults.
If this sounds like I'm trying to justify an approach, I am. I still dislike “configuration files for a configuration file” and I needed to convince myself that I'm not doing something nasty in this case.
Yes, I might be overthinking this a tad bit. But then again, trying to ensure six different components are consistently configured by using a single configuration file might make this approach A Good Thing™.