Monday, January 06, 2020
Adding CGI support to my gopher server
Back when I released my gopher server, the only way to generate dynamic output was to add a custom handler to the program. I noticed that other gopher servers all claimed CGI support, but when I was rewriting the gopher server, I felt that CGI support as defined didn't make much sense for gopher, but an email conversation changed my mind on the subject. I thought I would go through how I support CGI for my gopher server.
On a Unix system, the “meta-variables” defined in the specification are passed in as environment variables. So going through them all, we have:
Only required if the request requires authorization. Since gopher doesn't have that concept, this meta-variable doesn't have to be set. Good. Next.
This is only defined if data is being passed into the CGI script. The gopher protocol doesn't have this concept, so this meta-variable doesn't have to be set.
CONTENT_LENGTHisn't set, then this one doesn't need to be set either.
The specification I'm following defines version 1.1 of CGI, so this one is easy—it's just set to “1.1” and we're done.
This one is tough, and I had to run a bunch of experiments on my webserver to see how this meta-variable works. As the specification states:
It identifies the resource or sub-resource to be returned by the CGI script, and is derived from the portion of the URI path hierarchy following the part that identifies the script itself.
Basically, if I reference “/script” then
PATH_INFOisn't set, but if I reference “/script/data” then
PATH_INFOshould be “/data”. Because of this meta-variable (and a few others) I had to drastically change how requests are passed around internally, but I got this working.
One issue I had with this was leading slashes. Gopher doesn't have a concept of a “path”—it has the concept of a “selector,” which is an opaque sequence of characters that make up a reference. That, in turn, makes gopher URLs different enough from web URLs. This also means that a gopher “selector” does not have to start with a leading slash, something I had to mention up front on my gopher space (none of the selectors on my gopher site start with a slash). But there are gopher sites out there with selectors that do start with a slash, and I wanted to take both types into account. That was harder than it should have been.
But it also needs the leading portion of the selector upto the script name prepended. For example, if the selector is “Users:spc/script/foobar” then
PATH_INFOshould be “Users:spc/foobar”.
And this meta-variable is only set if there's a “sub-resource” defined on the selector.
And the beat goes on.
PATH_INFOis the selector with the script name removed (for the most part),
PATH_TRANSLATEDis the underlying filesystem location with the script name removed. So, using the example of “Users:spc/script/foobar” then the resulting
PATH_TRANSLATEDwould be “/home/spc/public_gopher/foobar”. Also, if
PATH_INFOis not set, then I don't have to deal wit this meta-variable.
Both where a bit tough to get right.
Easy enough—gopher does have the concept of search queries so if a search query is supplied, it's passed in this, otherwise, this is set to the empty string.
The one kicker here is that the specification states that
QUERY_STRINGis URL-encoded, which is not the case in gopher. I decided against URL-encoding the non-URL-encoded search query, which goes agains the standard, but there are other parts of the standard that don't fit gopher (which I'll get to in a bit).
The address of the remote side. Easy enough to provide. Enough said here.
The standard states:
The server SHOULD set this variable. If the hostname is not available for performance reasons or otherwise, the server MAY substitute the REMOTE_ADDR value.
I'm setting this to the
REMOTE_ADDRvalue. Done! Next!
Nobody these days supports
identand the specification states one may use this, so I'm not. Next.
AUTH_TYPEdoesn't apply, then this one doesn't apply, so it's not set.
This one was tough, and not because I had to go through contortions to generate the value. No, I had to to through mental contortions to come up with what to set this to. The specification is written for the web, and it's expected to be set to some HTTP method like
HEAD. But none of those (or really, any of the HTTP methods) apply here. I suppose one could say the
GETmethod applies, since that's semantically what one is doing, “getting” a resource. But the gopher protocol doesn't use any methods—you just specify the selector and it's served up. So after much deliberation, I decided to set this to the empty string.
I suppose the more technical response should be something like “-” (since the specification defines it must be at least one character long) but that's the problem with trying to adapt standards—sometimes they don't quite match.
This will typically be the selector echoed back, but the meta-variables
PATH_TRANSLATEDcomplicate this somewhat. But given that I've calculated those, this one wasn't that much of a problem.
Easy enough to pass through.
Again, easy enough to pass through.
Unlike the meta-variable
REQUEST_METHOD, this one was easy, “GOPHER”.
Again, easy to set.
The specification also allows protocol-specific meta-variables to be defined, and so I defined a few:
This is the top level directory where the script resides, and it can change from request to request. My gopher server can support requests to multiple different directories, so the
GOPHER_DOCUMENT_ROOTmay change depending upon where the script is served from.
This differs from the meta-variable
SCRIPT_NAMEas this is the actual location of the script on the filesystem.
SCRIPT_NAMEis the “name” of the script as a gopher selector.
The actual selector requested from the network.
And that pretty much covers the input side of things. The output, again was a bit difficult to handle, semantic wise. The standard expects the script to serve up a few headers, like “Status”, “Content-Type” and “Content-Length” but again, gopher doesn't have those concepts. After a bit of thought, I decided that anyone writing a CGI script for a gopher site knows they're writing a CGI script for a gopher site and such things won't need to be generated. And while in theory one could use a CGI script meant for the web on a gopher server, I don't think that will be a common occurance (HTML isn't common on most gopher sites). So at the places where I broke with the standard, that's why I did it. It doesn't make sense for gopher, and strict adherence to the standard will just mean some work done just to be undone.
By this point, I was curious as to how other gopher servers dealt with the CGI interface, so I looked at the implementations of three popular gopher servers, Gophernicus, Motsognir and Bucktooth. Like mine, they don't specify output headers, just the content. But unlike mine, they vary wildly with the meta-variables they defined:
Defines the least number:
And the following nonstandard meta-variable:
Defines a few more:
GATEWAY_INTERFACE, which is set to “CGI/1.0” and as far as I can tell, isn't described anywhere.
And the following nonstandard meta-variables:
QUERY_STRING_URL, which appears to be the same as
Which defines the most (even more than I do):
GATEWAY_INTERFACE, which is set to “CGI/1.1”
REQUEST_METHOD, which is set to “GET”
SERVER_PROTOCOL, which is set to either “HTTP/0.9” or “RFC1436”
And the nonstandard meta-variables:
CONTENT_LENGTH, which is set to 0
Gophernicus seems the most interesting. It seems they support running gopher over TLS, even though it doesn't make much sense (in my opinion), and try to make their CGI implementation appear most like a webserver.
What this says to me is that not many CGI scripts for gopher even look at the meta-variables all that much. But at least I can say I (mostly) support the CGI standard (somewhat—if you squint).