Tuesday, August 25, 2015
A better way to implement system()
I am very surprised I haven't covered today's topic before, but try as I might, I don't think I have. Weird!
Anyway, twenty years ago when I was attending college, I designed and implemented my own programming language. Oh, it wasn't for a class (although I did use it for a class project—more on this below) nor for work (although I did use it for work at the time) but for my own curiosity (basically, how a typical Forth was implemented).
And perhaps the most unique aspect of my language came about because of a class project to implement a Unix shell.
There were two options—the first was to do simple command line parsing and environment variable substitution
(for example, where the environment variable $HOME
would be swapped out for its value) and file redirection as a solo project,
or do that,
plus some programming constructs like conditional statements or loops as a group project.
Well,
I already had the programming constructs in the form of my language so I had already done the majority of the group project.
All that was left was to parse and execute Unix commands.
It took all of two hours to add and it pretty much worked right the first time.
Now,
my language was based loosely on Forth but with an object oriented flavor.
The built-in types,
like integers,
strings,
floats,
were underneath objects and I managed to shoehorn in polymorphism so that
operators like “+” and “/” would work across types.
It was in this environment that I decided to make Unix commands first-class.
I think it was a brilliant design and it allowed me the ability to sling commands that one could not do from the command line
(even today).
The big one was redirection of stderr
.
Modern Unix shells will allow you to redirect stderr
to a file:
GenericUnixPrompt% make 2>/tmp/errors.txt
and you can, kind of, pipe it to another command:
GenericUnixPrompt% make 2>&1 | more
but that includes all the normal output as well.
There are times when I would like just stderr
piped to a command,
but there is no real way to do that.
But you could,
rather easily,
in my language.
Well,
“easily” being a relative term,
but still,
I could arbitrarily redirect stderr
to a complex pipeline of commands while at the same time redirect stdout
to another complex pipeline of commands,
where each of those commands could redirect stdout
and stderr
as well.
Twenty years later and it's this article (link via Hacker News) that got me thinking about my language (seeing how I'm the sole maintainer, sole developer and not even sole active user) and about the unique ability to treat Unix commands like any other value in the program. And I realized that I could probably do the same using Lua.
A few hours later and I have a near working “proof-of-concept” (in that it creates the proper structures but doesn't actually execute anything yet):
cmd = "logfile" + C("escanlog","--refer")^{} / (C('diff','-','expected')^{} / "escanlog-error" + "escanlog-out") + C("sort")^{} / "/dev/null" + C("uniq","-c")^{} + C("sort","-rn")^{} + "/tmp/output" cmd()
It would be hard to translate this into an actual command line,
seeing how you can't really pipe stderr
.
Breaking this all down,
the function C()
creates a Unix command object;
the first parameter is the program, all the rest become command line objects.
The “^” is normally the exponent operator,
but here,
I'm using it do define which environment variables I want set for the command
and here, it's an empty environment.
For example,
if I want todays date in Swedish,
I could do:
cmd = C("date","+%c")^{ LANG="se_NO" } cmd()
This is something you can do at the command line, but it gets unwieldy for a large number of environment variables or even a different environment variable per command. And since this environment blob is just a regular Lua table, you can set up a custom environment as a variable and reference it that way.
The sequence
cmd = "logfile" + C("grep","foobar")
will change stdin
to be the file logfile
.
But this:
cmd = C("ls","-l") + "list.txt"
will cause stdout
to be written to the file list.txt
.
So generally speaking,
“+” will redirect stdin
or stdout
,
depending upon where it appears
(in this case, “+” is non-commutative).
This will even work when redirecting stdout
to another command:
cmd = C("ls","-l") + C("tr","a-z","A-Z")
Redirecting stderr
is done by using “/” and it works similarly do “+”—if a string is specified,
treat that as a filename and redirect stderr
to that file,
otherwise redirect stderr
to a command
(where it becomes stdin
).
And it wouldn't be hard to extend this to support resource limits per command as well.
The odd choice of operators is due to the limited choice available for Lua 5.1—Lua 5.3 has more operators to choose from, but to be useful, I limited myself to what's available for Lua 5.1.
I'm actually surprised something like this hasn't been done before (or if it has, I'm not aware of it).