Saturday, January 13, 2007
Notes on some programming postulates
During a lull in the D&D game last night, I was doing the programmer equivalent of doodling, engaging in one of my interests—programming language design. Old interests die hard, and this, along with operating system design, are very old, dating back to high school. Bunny asked what I was doing, and I quickly got self-conscious.
It happens when one is caught doodling.
Afterwards in discussing this, I mentioned some postulates I have about programming; I don't feel it's anything earth shattering, but pondering on them (and I've been pondering on them for a few years now) has lead to some interesting questions.
The first one, which I tend to think of as number zero (which, in programming circles, makes some sense), is: a program is a subroutine writ large.
Now, I'm using some very loose definitions of “program” and “subroutine” here. By “subroutine” I mean any sequence of code that does “something” and can be called one or more times. By “program” I mean any sequence of code that does “something” and can be run by the user. So really, except for the context in which it is used, a “subroutine” and a “program” aren't much different (and in fact, on some platforms like Microsoft Windows, you can treat a program like Microsoft Internet Explorer, for instance, as a subroutine to be used by a program you write).
So, the next three postulates. For each of these, when you see the term “subroutine” you can also substitute the term “program” without any change in meaning.
- A subroutine may accept input.
- A subroutine may do processing.
- A subroutine must produce output.
It was fun coming up with this list. The first version of the first postulate was “a subroutine must have input” which I then had to think if there were any types of subroutines that violated this.
And there were. A program to calculate e to 100,000 digits, the Unix
utility yes
and any number of utilties to overwrite files with
meaningless data. There are a class of programs that can very well deal
without any input whatsoever. So input is optional.
And the second one was similar: “a subroutine must do
processing.” But the definition of “processing” is a bit thorny—take
the Unix utility yes
, which just repeatedly spits out lines
consisting of a single “y” (you use it to automatically respond “yes” to
other Unix utilities, like fsck
). I have a hard time with the
notion of it doing any “processing”. I mean, the code is little more
than:
#include <stdio.h> #include <stdlib.h> int main(void) { while(1) printf("y\n"); return (EXIT_SUCCESS); /* not that it ever exits */ }
While technically there is processing going on, i.e. the CPU is processing instructions, it's not actually calculating. Yes, what it does is useful, but is it “processing?”
I wouldn't classify this as “processing” for the types of “processing” I'm thinking of, so postulate number two—a program must do processing, must be wrong. It may do processing.
It's the third one, “a subroutine must have output,” that has given me the most problems.
At first, I had a real hard time coming up with examples of a subroutine that had no output. I mean, why even bother calling a subroutine that had absolutely no output? What exactly, would be the point? I was about to conclude that all subroutines must have output, when I remembered writing just such a subroutine for my first computer, a Tandy Color Computer.
A small digression: The Color Computer technically had a serial port, in
that there is a one bit input port and a one bit output port which is driven
entirely by software. To send a byte, you drive the output port low for
X
µseconds, then set the port to match the first bit,
wait X
µseconds, then set the port to match the second
bit, wait X
µseconds, and so on, where X
is
a value dependent upon the baud rate. And I ended up writing code to drive
the “serial port” on the Color Computer. And as part of this, I had the
following subroutine:
DELAY PSHS X,CC LDX #413 DELAY10 LEAX -1,X BNE DELAY10 PULS X,CC,PC
This would delay 3.3 milliseconds, giving me a baud rate of 300bps.[1] The X
register, as well as the
condition codes, are saved, so the effect of this routine on the CPU state was nil. The only
difference between:
NOP * NO OPERATION
and
BSR DELAY * NO OPERATION, BUT A LONG DELAY
was 3,331 µseconds (NOP
requiring 2 µseconds to
execute).
Well. [Deep subject. —Editor]
Here I have, from my own archive, a subroutine that takes no input, does no real processing, and has no output.
And yet it's a useful subroutine.
Granted, it appears to be an exception to the rule that all subroutines
must have output. And really, these days, the majority of
programmers don't go about writing delay loops like this. Nope,
they make calls to sleep()
which … um … causes the program
… to … um … delay … for a period of time.
Well … um …
But …
But it seems wrong to conclude that subroutines may have output.
Because outside of a few esoteric subroutines like my DELAY
routine, or sleep()
, a subroutine must have output or
else, why bother calling it?
How to reconcile this?
How indeed.
But what if I now ask the question: “What is output?”
In the case of DELAY
, I'm calling the subroutine not for any
tangible result, but for the side effect of doing nothing for 3,333
µseconds. And in general, output is a type of side effect
(as any functional
programmer will tell you), since output affects the state of the program
(outputing a “y”, followed by a second “y” changes some state somewhere,
if only to advance the printing location of subsequent characters).
So, if side effects are considered a type of output, and
DELAY
has a side effect, then it can be considered to have
output, so now I don't have to reconcile anything—subroutines
must have output.
Like I said, it was nothing earth shattering here, but even Euclid started from a simple straight line [2] and from there expanded upon geometry.
- Actually, this code assumes a 1MHz clock rate on the CPU. The actual clock rate on the Computer Computer was 0.89MHz, but I used 1MHz for this example to make the math easier on me.
- Although you start getting odd results when you question the nature of a straight line.