A world of message-oriented programming languages

Wednesday, November 21, 2018

Ted Kaminski just asked, “What would a message-oriented programming language look like?” to which I answer, “any language with functions, which is to say, just about all computer languages.” Ted's answer is a bit different, but let me explain mine (and I've been meaning to write about this for fifteen years now—sigh).

Digression the First

In the programming language C, pointers and arrays are often conflated. They're declared differently:

int  array[10];
int *pointer_a;

But their use is similar:

int x = array[3];
int y = pointer_a[3]; // assuming pointer_a points to an array

I'm not going to go deep into the differences, but do note that array is a sequence of 10 integers, and pointer_a is the address of one or more integers.

Structures and pointers are not conflated to the same degree, mainly because they have different usage syntax (that I find annoying in my opinion):

struct foo
{
  int a;
  int b;
  int c;
};

struct foo  structure;
struct foo *pointer_s;

structure.a    = 3;

pointer_s->a   = 3; // assuming pointer_s points to a structure
  /* or */
*(pointer_s).a = 3; // assuming pointer_s points to a structure

The reason for the difference comes from very early C compilers where the fields in a structure declaration were considered offsets and not fields! So the following code:

struct foo
{
  int a;
  int b;
  int c;
};

struct bar
{
  char *x; // can't be a because that would conflict with above
  char *y; 
  char *z;
};

struct foo *pfoo; // assuming pfoo points somewhere valid

pfoo->a = 4;
pfoo->y = "hello";

was legal C code! Thankfully, it no longer is, but we still live with this in POSIX where field names for various structures all have a prefix, like struct stat { off_t st_size; ... } and struct timespec { time_t tv_sec; ... }.

Also, while nothing I see in the C standard seems to invalidate this assumption:

int array[3];

struct foo
{
  int a;
  int b;
  int c;
};

sizeof(array) == sizeof(struct foo);

it still seems like it goes against the standard to cast a pointer between the two types all willy-nilly. Just an observation.

Digression the Second

In C, you have arrays and structures. An array is a sequence of values of the same type, stored consecutively in memory. The individual elements are indexed by a number indicating its position in the array (C numbers the first element as 0; other languages such as Lua or Pascal start with 1).

A structure is a portion of memory with a particular layout of types. The individual elements are indexed by a name and appear in order, although not necessarily right next to each other (in other words, there may be what's called “padding” between fields of different types due to machine architecture restrictions). Then there's the concept known as a “tuple.” This is a cross between the array and structure. Like the structure, the individual elements may be of different types and sizes, but they are referenced like an array—an index into its position within the tuple. This is a type that doesn't exist in C, but it does exist in other, more dynamic languages like Python.

Digression the Third

My first real exposure to message passing as a concept was with the Amiga. It was your standard type of message passing, create or find a port, then you can send or receive messages. Messages themselves were a block of memory with a fixed header and a message-specific portion, and they could be sent synchronously (where the sending task was blocked waiting for a reply) or sent asynchronously (the message is sent, but the task continues to run, to possibly wait for a reply at a later time). Messages and message ports were used for all sorts of things on the Amiga, interacting with devices or the GUI, receiving signals, all sorts of stuff.

My second exposure to message passing was with QNX. Unlike the Amiga, there were no message ports—instead you passed messages to a given process. Messages themselves had no fixed structure, it was just a blob of memory being copied from one process space to another. And message passing was purely synchronous. You could do asynchronous message passing, but it required multiple threads to do so.

And it was here that I learned that neither one was more “primitive” than the other—you could always simulate one with the other.

Digression the Fourth

There are two orthogonal axes upon which you can implement message passing. The first axis is “synchronous/asynchronous”—is the task sending stopped or can it continue? The other axis is “reference/value”—does the task send a reference to the data or does it need to copy the data? In the case of QNX, it's a “synchronous, by-value” message passing paradigm. For the Amiga, it can send either synchronously and asynchronously, but in both cases, the data is sent by reference.

Digression the Fifth

Over the years I've programmed under a few windowing systems. Not much, but just enough to get a feeling for it. On the Amiga, you filled in a rather lengthy structure, then pass this to a function to open a new window.

On X Windows, you call one of two functions—one just takes a large number of paramters, the other one takes a large number of paramters, one of which is a rather lengthy structure.

I recall little of Windows and OS/2 (being way back in the early 90s) but I think they were a bit worse than X Windows—a large number of parameters, several of which were rather lengthy structures.

But for all of them, you sat in a so called “event loop”—waiting for events from the GUI, then went off and handle the message. On the Amiga, events were received from a message port. On X, it was a function that returned a structure representing the event. Windows and OS/2 you supplied a function that received four parameters that comprise the event—you were not in control of the main event loop.

Digression the Sixth

So a while back, I was studing the VAX architecture, like you do, when I came across the CALLG and CALLS instructions. Both are used to call a function. CALLG requires the address of the argument list to be passed in:

	<data section>
ARGLIST:	.LONG	2	; argument count
		.LONG	600	; first argument
		.LONG	84	; second argument

	<code section>

		CALLG	ARGLIST,FOOBAR

FOOBAR is called with two arguments. For the CALLS instruction, you push the arguments onto the stack:

	<code section>

		PUSHL	#84
		PUSHL	#600
		CALLS	#2,FOOBAR

Again, FOOBAR is called with two arguments. FOOBAR itself does not have to care how it was called—it receives a pointer to the argument list in register R12 (aka the AP register). It was then I had an epiphany.

The Epiphany

So, in the case of FOOBAR, one way of calling it could look like:

struct foobar_data
{
  int a;
  int b;
};

foobar_data fdata = { .a = 600 ; .b = 84 }
foobar(&foobar_data);

But another way of calling it could look like:

foobar(600,84);

Really, all CALLS is doing is initalizing a temporary structure whose fields are otherwise known as “parameters” and passing this structure to the operand, in this case FOOBAR. The parameter list to a function can be viewed as a structure. And all of the examples I've seen of message passing is just passing along data, usually structured as a structure (sorry) or a tuple (depending upon language).

And then the ephiphany! Calling a function with parameters is just another form of synchronous message passing, either by-reference or by-value (this is either an unusual or obvious thought, but it took me a while to reach it if it was obvious). That nasty Windows call to create a window? Just pass a really large structure or “message.” And that's really what's happening under the hood of X Windows—a message is being passed from the X client to the X server.

And this does leave me to wonder at times what the semantics of an asynchronous function call would (or could) be.

But yes, we already have message-oriented programming languages—if you squint the right way …

The Boston Diaries

Wednesday, November 21, 2018