The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Wednesday, February 02, 2005


Over the past few days there's been this almost near constant whirring noise in the office. It sounds like it's just on the other side of my cubicle with that neat Zen-like emptiness to it (and yes, it still has that neat Zen-like emptiness to it, dispite sharing it with a new tech support person).

Whirr. Whirr.

Finally, curiousity got the better of me, and I decided to track down the source of the whirring.

Whirr. Whirr.

The person on the other side of my cubicle with that neat Zen-like emptiness to it has what looks like a very short cattle prod but is instead a massage unit that is currently in use.

Whirr. Whirr.


Whirr. Whirr.

Back to work.

Whirr. Whirr.

At least it has a beat you can dance to.

Oh, it stopped.

URLs, Identity, Authentication and Trust

The other day I mentioned LID and the problems I saw with the URLs they were using as part of the processing. I wrote to Johannes Ernst, who is the principle architect of LID about the problem and after an exchange, I wrote the following, which explains part of the problem:

From: <>
Subject: Re: Question about the code for LID
Date: Wed, 2 Feb 2005 21:56:15 -0500 (EST)

It was thus said that the Great Johannes Ernst once stated:

Good catch. I recall we had a discussion on that and couldn't quite figure out which way was the right way …

If you are trying to send parameters both as part of the URL and with a POST, the URL should be of the form:;xpath=...;action=...

You seem to be the person to ask: where is it defined that URL parameters and POST parameters in combination should be used that way? I recall we looked and couldn't find anything, but maybe you would know?

Well, I've checked RFC-1808, RFC-2396 and RFC-3986, and as I interpret these, the general scheme of a URL is (and skipping the definitions of parts not germain to this—stuff in [] is optional, “*” means 0 or more, “+” is 1 or more, parenthesis denote groupings, etc):

<url>     ::= <scheme> ':' [<net>] [<path>] ['?' <query>] ['#' <fragment>]
<net>     ::= '//' [<userinfo> '@'] <host> [ ':' <port> ]
<path>    ::= <segment> *( '/' <segment> )
<segment> ::= *<character> *( ';' *<param> )
<param>   ::= *<character>
<query>   ::= *<character>

No structure is defined for <param> or <query> in the RFCs. By convention, the <query> portion is defined as:

<query>      ::= <namevalue> *( '&' <namevalue> )
<namevalue>  ::= +<qchar> '=' *<qchar>
<qchar>      ::= <unreserved> | <escape>
<unreserved> ::= <alpha> | <digit> | "$" | "-" | "_" | "." | "+" | "!"
                   | "*" | "'" | "(" | ")" | ","
<escape>     ::= '%' <hexdigit> <hexdigit>

(there are also some restrictions on what can appear in a <segment> and <param>, but those are covered in the various RFCs, but note that the semicolon needs to be escaped in the query string, fancy that!)

The upshot is that you can have a URL of the form:;1/to;type=b/file.ext;v4.4?foo=a&bar=b (note)

(note—when giving a URL in an HTML document, the '&' needs to be escaped to pass HTML validation)

where the parameters “;1”, “type=b” and “v4.4” apply to “path”, “to” and “file.ext” respectively (according to RFCs 2396 and 3986—as I read RFC-1808 it says that the <param> can only appear once, and that at the end of the <path> portion—but given that RFCs 2396 and 3986 are newer, I'll take those as correct).

Now, how this all relates to LID?

It's not pretty I'm afraid.

Apache doesn't parse the path params correctly [as I found out today in playing around with this stuff –Sean]—or to be more precise, it doesn't parse them at all and passes them directly through to the filesystem. So while what you have is a decent workaround, it will probably only work for the perl CGI:: module; I can't say for any CGI modules in other languages (and I know the one I wrote would have to be modified to support LID URLs).

I haven't had time to really look through the LID code to see all what it does, but as I have time, I'll do that. But feel free to keep asking questions—I'd like to help.

Basically, what they're trying to do is pass parameters as part of the URL, as well as pass parameters as part of a form (primarily using the POST method for certain actions). And what they're trying to do isn't so much illegal (in the “this type of stuff is not allowed in the protocol” sense) as it is unspecified. I created a simple test page of forms with various combinations of path parameters and query parameters using both GET and POST methods to see what information is passed through Apache to the simple script I was referencing. The results are mixed:

Results of URL parameters with <FORM> parameters
Method Script URL Paramters obtained from URL Paramters obtained from <FORM>
Method Script URL Paramters obtained from URL Paramters obtained from <FORM>
GET script.cgi;n1=v1;n2=v2 Requested URL script.cgi;n1=v1;n2=v2 was not found on this server
GET script.cgi/;n1=v1;n2=v2 YES, in $PATH_INFO as “/;n1=v1;n2=v2” YES
GET script.cgi?n1=v1;n2=v2 NO YES
GET script.cgi/?n1=v1;n2=v2 NO YES
GET script?n1=v1&n2=v2 NO YES
POST script.cgi;n1=v2;n2=v2 Requested URL script.cgi;n1=v1;n2=v2 was not found on this server
POST script.cgi/;n1=v1;n2=v2 YES, in $PATH_INFO as “/;n1=v1;n2=v2” YES
POST script.cgi?n1=v1;n2=v2 YES, in $QUERY_STRING as “n1=v1;n2=v2” YES
POST script.cgi/?n1=v1;n2=v2 YES, in $REQUEST_URI as “script.cgi/?n1=v1;n2=v2” YES
POST script.cgi?n1=v1&n2=v2 YES YES

I'm beginning to think that the way LID works is about the only way for it to realistically work across webservers, seeing how path parameters are rarely used and that mixing parameters from the URL and <FORM> is unspecified for the most part.


Obligatory Picture

[It's the most wonderful time of the year!]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site:, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2023 by Sean Conner. All Rights Reserved.