The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Tuesday, August 25, 2015

A better way to implement system()

I am very surprised I haven't covered today's topic before, but try as I might, I don't think I have. Weird!

Anyway, twenty years ago when I was attending college, I designed and implemented my own programming language. Oh, it wasn't for a class (although I did use it for a class project—more on this below) nor for work (although I did use it for work at the time) but for my own curiosity (basically, how a typical Forth was implemented).

And perhaps the most unique aspect of my language came about because of a class project to implement a Unix shell. There were two options—the first was to do simple command line parsing and environment variable substitution (for example, where the environment variable $HOME would be swapped out for its value) and file redirection as a solo project, or do that, plus some programming constructs like conditional statements or loops as a group project. Well, I already had the programming constructs in the form of my language so I had already done the majority of the group project. All that was left was to parse and execute Unix commands. It took all of two hours to add and it pretty much worked right the first time.

Now, my language was based loosely on Forth but with an object oriented flavor. The built-in types, like integers, strings, floats, were underneath objects and I managed to shoehorn in polymorphism so that operators like “+” and “/” would work across types. It was in this environment that I decided to make Unix commands first-class. I think it was a brilliant design and it allowed me the ability to sling commands that one could not do from the command line (even today). The big one was redirection of stderr. Modern Unix shells will allow you to redirect stderr to a file:

GenericUnixPrompt% make 2>/tmp/errors.txt

and you can, kind of, pipe it to another command:

GenericUnixPrompt% make 2>&1 | more

but that includes all the normal output as well. There are times when I would like just stderr piped to a command, but there is no real way to do that.

But you could, rather easily, in my language. Well, “easily” being a relative term, but still, I could arbitrarily redirect stderr to a complex pipeline of commands while at the same time redirect stdout to another complex pipeline of commands, where each of those commands could redirect stdout and stderr as well.

Twenty years later and it's this article (link via Hacker News) that got me thinking about my language (seeing how I'm the sole maintainer, sole developer and not even sole active user) and about the unique ability to treat Unix commands like any other value in the program. And I realized that I could probably do the same using Lua.

A few hours later and I have a near working “proof-of-concept” (in that it creates the proper structures but doesn't actually execute anything yet):

cmd = "logfile"
    +  C("escanlog","--refer")^{}
    / (C('diff','-','expected')^{} / "escanlog-error" + "escanlog-out")
    +  C("sort")^{} / "/dev/null"
    +  C("uniq","-c")^{}
    +  C("sort","-rn")^{}
    + "/tmp/output"
cmd()

It would be hard to translate this into an actual command line, seeing how you can't really pipe stderr. Breaking this all down, the function C() creates a Unix command object; the first parameter is the program, all the rest become command line objects. The “^” is normally the exponent operator, but here, I'm using it do define which environment variables I want set for the command and here, it's an empty environment. For example, if I want todays date in Swedish, I could do:

cmd = C("date","+%c")^{ LANG="se_NO" }
cmd()

This is something you can do at the command line, but it gets unwieldy for a large number of environment variables or even a different environment variable per command. And since this environment blob is just a regular Lua table, you can set up a custom environment as a variable and reference it that way.

The sequence

cmd = "logfile" + C("grep","foobar")

will change stdin to be the file logfile. But this:

cmd = C("ls","-l") + "list.txt"

will cause stdout to be written to the file list.txt. So generally speaking, “+” will redirect stdin or stdout, depending upon where it appears (in this case, “+” is non-commutative). This will even work when redirecting stdout to another command:

cmd = C("ls","-l") + C("tr","a-z","A-Z")

Redirecting stderr is done by using “/” and it works similarly do “+”—if a string is specified, treat that as a filename and redirect stderr to that file, otherwise redirect stderr to a command (where it becomes stdin). And it wouldn't be hard to extend this to support resource limits per command as well.

The odd choice of operators is due to the limited choice available for Lua 5.1—Lua 5.3 has more operators to choose from, but to be useful, I limited myself to what's available for Lua 5.1.

I'm actually surprised something like this hasn't been done before (or if it has, I'm not aware of it).

Obligatory Picture

[It's the most wonderful time of the year!]

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: http://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

http://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2019 by Sean Conner. All Rights Reserved.