Sunday, January 14, 2007
Some musings on coroutines
Over a year ago, having nothing else better to do (seeing how the power was out), I did some research into co-routines. I figure now is as good a time as any to talk about some conclusions I came to (seeing how all that programming talk yesterday is inspiring).
Subroutines are special cases of more general program components, called “coroutines.” In contrast to the unsymmetric relationship between a main routine and a subroutine, there is complete symmetry between coroutines, which call on each other.
To understand the coroutine concept, let us consider another way of thinking about subroutines. The viewpoint adopted in the previous section was that a subroutine merely was an extension of the computer hardware, introduced to save lines of coding. This may be true, but another point of view is possible: We may consider the main porogram and the subroutine as a team of programs, with each member of the team having a certain job to do. The main program, in the course of doing its job, will activate the subprogram; the subprogram performs its own function and then activates the main program. We might stretch our imagination to believe that, from the subroutine's point of view, when it exits it is calling the main routine; the main routine continues to perform its duty, then “exits” to the subroutine. The subroutine acts, then calls the main routine again.
This somewhat far-fetched philosophy actually takes place with coroutines, when it is impossible to distinguish which is a subroutine of the other. Suppose we have coroutines
A
andB
; when programmingA
, we may think ofB
as our subroutine, but when programmingB
, we may think ofA
as our subroutine. … It represents teamwork as in a relay race. Whenever a coroutine is activated, it resumes execution of its program at the point where the action was last suspended.The Art of Computer Programming, § 1.4.2
When researching just about anything dealing with computers, it's probably best to start with the source, Donald Knuth. § 1.4.2 covers the basics of coroutines and gives an example, which would look something like this (when converted from MIX assembly, used for all the examples in the book, to a C-like langauge, which is a bit easier to understand):
int input(void) { int c; while(1) { c = getchar(); if (isdigit(c)) { int count = c - '0' + 1; c = getchar(); for ( ; count ; count--) yield to output(c); } else yield to output(c); if (c == '.') return; } } void output(int c) { int count = 0; int c; while(1) { yield to c = input(); putchar(c); if (c == '.') return; count++; if (count == 3) { putchar(' '); count = 0; } } }
The thing to understand about coroutines is that it isn't a normal
subroutine call. Normally, each time input()
is called,
control is passed to the first line of the routine, but in a coroutine
“call,” control is initially passed to the first line, but each
subsequent “call” resumes where input()
left off (in this
case, marked with the keyword yield
). If we change the code a
bit (remember, this isn't C, but a C-like langauge), this can become
clearer:
void input() { while(1) { c = getchar(); if (isdigit(c)) { int count = c - '0' + 1; c = getchar(); for ( ; count ; count--) send c to output; } else send c to output; if (c == '.') return; } } void output() { int count = 0; int c; while(1) { receive c from input; putchar(c); if (c == '.') return; count++; if (count == 3) { putchar(' '); count = 0; } } }
Here, it should be a bit clearer what is going on—input()
is
getting some data, then passing it to the coroutine output()
.
I've changed the way that coroutines are “called” for two reasons. One,
it's a bit clearer what is going on, and two, there's a bit more going on
here than just very fancy goto
s. In § 1.4.2, Knuth gives
the following illustration (recreated here):
Passes: (a) a four-pass algorithm, and (b) a one-pass algorithm.
Those familiar with Unix should see a familiar pattern here:
GenericUnixPrompt> pass-a <input >tape.1 GenericUnixPrompt> pass-b <tape.1 >tape.2 GenericUnixPrmpot> pass-c <tape.2 >tape.3 GenericUnixPrompt> pass-d <tape.3 >output
Or, more succinctly:
GenericUnixPrompt> pass-a < input | pass-b | pass-c | pass-d >output
Coroutines are nothing more than routines that communicate via pipes! A type of message passing, if you will.
Not only that, but given the rise of multi-core processors, this appears to be a natural fit for multithreaded programs. A piped Unix command line uses a separate process for each component, so it appears to be a no-brainer that each component in a coroutine chain can be given its own thread. With that, and some syntactic help, we can rewrite Knuth's example and extend it a bit:
void input() receive char; send char; { char c; while(1) { receive c; if (isdigit(c)) { int count = c - '0' + 1; receive c; for ( ; count ; count --) send c; } else send c; if (c == '.') return; } } void output() receive char; send char; { int count = 0; char c; while(1) { receive c; send c; if (c == '.') return; count++; if (count == 3) { send ' '; count = 0; } } } void filter(char from,char to) receive char; send char; { while(1) { accept c; if (c == from) c = to; send c; } } int main(void) { string in; string out; in = get_input(); in => input() => filter('a','n') => filter('e','x') => output() => out; print(out); }
Coroutines make the same fringe problem trivial to implement:
void tree_leaves(Tree t) send Tree; { if (t == nil) send nil; if (is_leaf(t)) send t; send tree_leaves(t->left); send tree_leaves(t->right); } bool same_fringe(Tree t1,Tree t2) { Tree tmp1; Tree tmp2; while ( (tree_leaves(t1) => tmp1) && (tree_leaves(t2) => tmp2) ) { if (tmp1 != tmp2) return (false); } return true; }
There are even more implications here. The producer/consumer problem is one used to teach semaphores, but with coroutines, it becomes rather trivial:
void producer() send datum; { datum data; while(1) { data = generate_datum(); send data; } } void consumer() receive datum; { datum data; while(1) { receive data; crunch_datum(data); } } producer() <=> consumer();
Even with multiple consumers, I don't see a problem with this model:
producer() <=> (consumer(), consumer());
And other problems used to teach semaphores, like the multiple-reader, exclusive writer problem, become trivial:
void master_control_program() receive command; receive datum; send datum; { command cmd; datum data; while(1) { accept cmd; if (cmd == read) send data; else if (cmd == write) receive data; } } void reader(int id) { datum data; while(1) { send read; receive data; process(data,id); } } void writer(int id) { datum data; while(1) { sleep(random()); data = generate_datum(id); send write; send data; } } master_control_program() <=> ( reader(1), reader(2), reader(3), writer(1), reader(4), writer(2), writer(3), reader(5));
But in retrospect, this is nothing more than a reinvention of communicating sequential processes, and what I've managed to do is basically hide message passing within the framework of coroutines, which is why I suddenly found myself easily solving problems that normally require semaphores.
Granted, there aren't any langauges that do this sort of thing (well, there is Erlang, but it's a functional language, not imperative like C or C++), and if I were to implement a language with coroutine support like this, there are still issues of implementation and syntax to work out, but I'm still playing around with this stuff for now.