The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Thursday, March 09, 2006

RACTER, generative text and Lisp. Oh my!

It figures.

I make disparaging remarks about Lisp, and it turns out to possibly be the best language to use for a project I'm working on.


The project (or projects—there are a few things I want to work on that can use this) deals with computer generated text (or “generative text” as it's called). Textual mashups if you will, like Crow Haiku Generator or my own Quick and Dirty B-Movie Plot Generator. There are plenty more out there if you look.

A lot of these are nothing more than arrays of strings and a slew of code to sling the pieces together. Something like:

temp = Newsflash[rand(Newsflash.length)] + "\n"
       + Location[rand(Location.length)] + " was attacked at dawn by "
       + Monsters[rand(Monsters.length)] + " "
       + From[rand(From.length)] + " "
       + Lowersheol[rand(Lowersheol.length)] + ".  Our hero "
       + Hero[rand(Hero.length)] + " " 
       + Fought[rand(Fought.length)] + ".";

Which would generate output like:

******* FLASH! FLASH! FLASH! *******

Disneyland was attacked at dawn by giant green worms from the lowest levels of The IRS. Our hero Tim Conway told their mothers they were naughty.

This is probably one reason why so many generative text programs don't generate all that much different text—they're too tedious to program.

I wanted an easier way to create the data and sling the words around—I wanted a “program” I could write that could then be processed, much like INRAC, the compiler used to create RACTER (which supposedly wrote The Policeman's Beard is Half Constructed), but using INRAC, to generate something like this:

Bill Gates bit Winston Churchill, but Winston Churchill just smiled. Bill Gates snarled, “Saint Winston Churchill, I presume”. “That's a Bill Gatesesque remark” replied Winston Churchill.

You write this:

*story code section
b >HERO*person[&P] >VILLAIN*person[&N] #
c $VILLAIN #RND3 bit robbed hit $HERO , #
d but $HERO just #RND3 smiled laughed shrugged . #
' new:
e $VILLAIN snarled >X=Saint,HERO "> $X , I presume <*. #
f "That's a !Y=VILLAIN;esque remark" replied $HERO .

Um … yeah.

Thanks, but no thanks.

Building upon my own Quick-n-Dirty B-Movie Plot Generator, I wrote two further prototypes, one in Perl and one in C; both can use the same input files (and are not restricted to just generating B-movie plots). Why a version in C? Because the one in Perl consumes an insane amount of memory and takes almost 600 times longer to generate the output (since the C version can process the input files once into an intermediate format that is quicker to use whereas the Perl version has to re-read the input files for each invocation; but even when I include the preprocessing time in the C version it's still four times faster than Perl). The format I came up with is much easier to deal with:

# declare an array of text, which, when
# referenced, will return one of the lines
# at random.  It ends with a semicolon alone
# on a line.

	I pray thou shalt
	I hope you will

	be plagued with gnats, flies and locusts
        be taunted by the king's concubines

	O thou
	O ye

	incompetent tax-collector
	lazy Babylonian

	Hear this
	Take heed

# to reference an array, use %a-arrayname% #
# one of the following lines will be used

%a-MayYou% %a-HaveBadThingsHappen%, %a-OhYou% %a-OfLittleFaith%!
%a-HearThis% %a-OhYou% %a-OfLittleFaith%, for you will %a-HaveBadThingsHappen%!

(I pulled these from a webpage that generated Biblical taunts, but I've since lost the link).

So that's the point I am now—I can replicate most of the generative text programs I've seen on webpages pretty quickly (about as quick as I can copy the text out and into this format). Star Trek technobabble, fake Nostradamus quatrains, crow haikus, all easy now.

Except I'm still not at the level of INRAC. I have no way of generating variables and article generation (“a” or “an”) is still a bit problematic. The syntax I'm currently using, while better than anything else I've seen, is still clumsy (it's a holdover from the initial Perl version. So I've been playing around with syntax a bit, making it easier, and adding variables and functions (say, to generate the appropriate article for a word, or a random number). So far, I like what I have for the new syntax:

# assume we're using the Bible Taunts

# declare a variable TheProphet, which
# is a randomly choosen Bible name


[MayYou] [HaveBadThingsHappen] [OhYou] [OfLittleFaith]!
[HearThis] [OhYou] [OfLittleFaith], for you will [HaveBadThingsHappen]!
{TheProphet} had a dream-you will [HaveBadThingsHappen].  So says {TheProphet}.

But things are progressing to where I'm reaching for lex and yacc to handle the parsing and processing of the input files.

I am, in effect, writing a programming language.

Want to know something?

Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified bug-ridden slow implementation of half of Common Lisp.

Greenspun's Tenth Rule of Programming

For all my misgivings about Lisp, I'm finding myself starting to write an ad-hoc, informally-specified bug-ridden slow implementation of Lisp.


Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site:, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.