The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Sunday, October 13, 2019

How many redirects does your browser follow?

An observation on the Gemini mailing list led me down a very small rabbit hole. I recalled at one time that a web browser was only supposed to follow five consecutive redirects, and sure enough, in RFC-2068:

10.3 Redirection 3xx

This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD. A user agent SHOULD NOT automatically redirect a request more than 5 times, since such redirections usually indicate an infinite loop.

Hypertext Transfer Protocol -- HTTP/1.1

But that's an old standard from 1997. In fact, the next revision, RFC-2616, updated this section:

10.3 Redirection 3xx

This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD. A client SHOULD detect infinite redirection loops, since such loops generate network traffic for each redirection.

Note: previous versions of this specification recommended a maximum of five redirections. Content developers should be aware that there might be clients that implement such a fixed limitation.

Hypertext Transfer Protocol -- HTTP/1.1

And subsequent updates have kept that language. So it appears that clients SHOULD NOT (using language from RFC-2119) limit itself to just five times, but still SHOULD detect loops. It seems like this was changed due to market pressure from various companies and I think the practical limit has gone up over the years.

I know the browser I use, Firefox, is highly configurable and decided to see if its configuration included a way to limit redirections. And lo', it does! The option network.http.redirection-limit exists, and the current default value is “20”. I'm curious to see what happens if I set that to “5”. I wonder how many sites will break?

Monday, October 07, 2019

Tool selection

So if I needed to parse data in C, why did I not use lex? It's pretty much standard on all Unix systems, right? Yes, but all it does is lexical analysis. The job of parsing requires the use of yacc. So why didn't I use yacc? Beacuse it doesn't do lexical analysis. If I use lex, I also need to use yacc. Why use two tools when one will suffice? They are also both a pain to use, so it's not like I immediately think to use them (that, and the last time I used lex in anger was over twenty years ago …)


I was working harder, not smarter

Another department at the Ft. Lauderdale Office of the Corporation is refactoring their code. Normally this wouldn't affect other groups, but this particular code requires some executables we produce, and to make it easier to install, we, or rather, I, needed to create a new repository with just these executables.

Easier said than done.

There's about a dozen small utilities, each a single C file, but unfortunately, to get the banana (the single C file) you also need the 800 pound gorilla (its dependencies). Also, these exectuables are spread through most of our projects—there's a few for “Project: Wolowizard” (which is also used for “Project: Sippy-Cup”), multiple ones for “Project: Lumbergh,” a few for “Project: Cleese” and … oh, I never even talked about this other project, so let's just call it “Project: Clean-Socks.”

Uhg.

So that's how I spent my time last week, working on “Project: Seymore,” rewriting a dozen small utilities to remove the 800 pounds of gorilla normally required to compile these tools. All these utilties do is transform data from format A to format B. The critical ones take a text file of lines usually in the form of “A = B” but there was one that took over a day to complete because of the input format:

A = B:foo,bar,... name1="value" name2="value" ...
A = B:none

Oh, writing parsing code in C is so much fun! And as I was stuck writing this, I kept thinking just how much easier this would be with LPEG. But alas, I wanted to keep the dependencies to a minimum, so it was just grind, grind, grind until it was done.

Then today, I found that I had installed peg/leg, the recursive-descent parser generator for C, on my work machine eight years ago.

Eight years ago!

Head, meet desk.

Including the time to upgrade peg/leg, the time it took me to rewrite the utility that took me nearly two days only took two hours (most of the code among most of the utilities is the same—check options, open files, sort the data, remove duplicates, write the data; it's only the portion that reads and converts the data that differs). It's also shorter, and I think easier to modify.

So memo to self: before diving into a project, check to see if I already have the right tools installed.

Sigh.

Saturday, October 05, 2019

More stupid benchmarks about compiling a million lines of code

I'm looking at the code GCC produced for the 32-bit system (I cut down the number of lines of code):

 804836b:       68 ac 8e 04 08          push   0x8048eac
 8048370:       e8 2b ff ff ff          call   80482a0 <puts@plt>
 8048375:       68 ac 8e 04 08          push   0x8048eac
 804837a:       e8 21 ff ff ff          call   80482a0 <puts@plt>
 804837f:       68 ac 8e 04 08          push   0x8048eac
 8048384:       e8 17 ff ff ff          call   80482a0 <puts@plt>
 8048389:       68 ac 8e 04 08          push   0x8048eac
 804838e:       e8 0d ff ff ff          call   80482a0 <puts@plt>
 8048393:       68 ac 8e 04 08          push   0x8048eac
 8048398:       e8 03 ff ff ff          call   80482a0 <puts@plt>
 804839d:       68 ac 8e 04 08          push   0x8048eac
 80483a2:       e8 f9 fe ff ff          call   80482a0 <puts@plt>
 80483a7:       68 ac 8e 04 08          push   0x8048eac
 80483ac:       e8 ef fe ff ff          call   80482a0 <puts@plt>
 80483b1:       68 ac 8e 04 08          push   0x8048eac
 80483b6:       e8 e5 fe ff ff          call   80482a0 <puts@plt>
 80483bb:       83 c4 20                add    esp,0x20

My initial thought was Why doesn't GCC just push the address once? but then I remembered that in C, function parameters can be modified. But that lead me down a slight rabbit hole in seeing if printf() (with my particular version of GCC) even changes the parameters. It turns out that no, they don't change (your mileage may vary though). So with that in mind, I wrote the following assembly code:

        bits    32
        global  main
        extern  printf

        section .rodata
msg:
                db      'Hello, world!',10,0

        section .text
main:
                push    msg
                call    printf
	;; 1,999,998 more calls to printf
		call	printf
		pop	eax
		xor	eax,eax
		ret

Yes, I cheated a bit by not repeatedly pushing and popping the stack. But I was also interested in seeing how well nasm fares compiling 1.2 million lines of code. Not too badly, compared to GCC:

[spc]lucy:/tmp>time nasm -f elf32 -o pg.o pg.a

real    0m38.018s
user    0m37.821s
sys     0m0.199s
[spc]lucy:/tmp>

I don't even need to generate a 17M assembly file though, nasm can do the repetition for me:

        bits    32
        global  main
        extern  printf

        section .rodata

msg:            db      'Hello, world!',10,0

        section .text

main:           push    msg
        %rep 1200000
                call    printf
        %endrep

                pop     eax
                xor     eax,eax
                ret

It can skip reading 16,799,971 bytes and assemble the entire thing in 25 seconds:

[spc]lucy:/tmp>time nasm -f elf32 -o pf.o pf.a

real    0m24.830s
user    0m24.677s
sys     0m0.144s
[spc]lucy:/tmp>

Nice. But then I was curious about Lua. So I generated 1.2 million lines of Lua:

print("Hello, world!")
-- 1,999,998 more calls to print()
print("hello, world!")

And timed out long it took Lua to load (but not run) the 1.2 million lines of code:

[spc]lucy:/tmp>time lua zz.lua
function: 0x9c36838

real    0m1.666s
user    0m1.614s
sys     0m0.053s
[spc]lucy:/tmp>

Sweet!

Friday, October 04, 2019

It's a stupid benchmark about compiling a million lines of code, what else did I expect?

I came across a claim that the V programming langauge can compile 1.2 million lines of code per second. Then I found out that the code was pretty just 1,200,000 calls to println('hello world'). Still, I was interested in seeing how GCC would fare. So I coded up this:

#include <stdio.h>

int main(void)
{
  printf("Hello world!\n");
  /* 1,199,998 more calls to printf() */
  printf("Hello world!\n");
  return 0;
}

which ends up being 33M, and …

[spc]lucy:/tmp>time gcc h.c
gcc: Internal error: Segmentation fault (program cc1)
Please submit a full bug report.
See <URL:http://bugzilla.redhat.com/bugzilla> for instructions.

real    14m36.527s
user    0m40.282s
sys     0m17.497s
[spc]lucy:/tmp>

Fourteen minutes for GCC to figure out I didn't have enough memory on the 32-bit system to compile it (and the resulting core file exceeded physical memory by three times). I then tried on a 64-bit system with a bit more memory, and I fared a bit better:

[spc]saltmine:/tmp>time gcc h.c

real    7m37.555s
user    2m3.000s
sys     1m23.353s
[spc]saltmine:/tmp>

This time I got a 12M executable in 7½ minutes, which seems a bit long to me for such a simple (but large) program. I mean, Lua was able to compile an 83M script in 6 minutes, on the same 32-bit system as above, and that was considered a bug!

But I used GCC, which does some optimizations by default. Perhaps if I try no optimization?

[spc]saltmine:/tmp>time gcc -O0 h.c

real    7m6.939s
user    2m2.972s
sys     1m27.237s
[spc]saltmine:/tmp>

Wow. A whole 30 seconds faster. Way to go, GCC! Woot!


Back when I was a kid, all I had to worry about was the mass extinction of the human race due to global thermonuclear war

Bunny and I are out eating dinner at T. B. O. McFlynnagin's and out of the corner of my eye on one of the ubiquitious televisions dotting the place, I saw what appeared to be a “back to school” type commercial but one that turned … dark. I'm normally not one for trigger warnings, but this commercial, which did air because I saw it, is quite graphic. So … you have been warned!

It reminds me of the “Daisy” commercial, although it's hard to say which one is worse. Perhaps both of them are.

Wednesday, October 02, 2019

“Night of the Lepus” was based on a book‽

I'm going to lunch with a few cow-orkers and ST is driving. While driving, we're subject to his music listening choices, which tend towards movie and video game scores. As a joke, I mention that he's playing the score to “Night of the Lepus” and to my total surprise, no one else in the vehicle had ever heard of the movie.

So of course I start reading off the plot synopsis from Wikipedia and I'm amazed to learn that it's based on a book! “Night of the Lepus” was originally a book! I then switch to reading the plot synopsis of The Year of the Angry Rabbit and … it sounds amazing! An attempt to eradicate rabbits in Australia leads to world peace through an inadvertant doomsday weapon with occasional outbreaks of killer rabbits.

Wow!

Why wasn't that movie made?

Tuesday, October 01, 2019

It only took 35 years …

The first accurate portrayal of a black hole in Hollywood was in the 2014 movie “Interstellar” with help from theoretical physicist Kip Thorne, and the images from that movie do appear to match reality. But I find it facinating that astrophysicist Jean-Pierre Luminet generated an image of a black hole in April of 1979!

It's sad to think that Disney'sThe Black Hole,” which came out in December of 1979, could have not only been the first Hollywood portrayal of a black hole (which it appears it was), but it could have been an accurate portrayal of a black hole. Ah well …

Monday, September 30, 2019

Adding redirection to the gopher protocol

The primary gopher protocol specification is totally mum on the topic of errors. The word “error” only occurs once and that just to note that the gopher type “3” is an error. So given the lack of specification, I thought I might do an experiment and see if I can't introduce the concept of “redirection” to the gopher protocol. It can hardly be thought of violating both the spirit and letter of the spec if there's nothing in the spec to figuratively or literally violate.

Upon encoutering some form of error, say, a nonexistent selector, a gopher server is supposed to return an “error selector” that looks something like:

3'/Phlog:' doesn't exist!HTHTerror.hostHT1CRLF

What I'm doing is giving some structure to the “error selector.” The text portion (the bit right after the “3” and before the first tab character) will be a fixed string giving the actual error, So for a nonexistent selector, my gopher server will return:

3Not foundHT/Phlog:HTgopher.conman.orgHT70CRLF

The text portion will always be “Not found” with the nonexistent selector being returned along with the hostname and port. Now, for a redirection, it will return

3Permanent redirectHTPhlog:HTgopher.conman.orgHT70CRLF

The text portion will always be “Permanent redirect” with the new selector being given, along with the host and port number. Doing this will allow me to even redirect a request to another gopher server. Well, as long as the gopher client understands the error text.

Using literal text strings like this isn't ideal, but it doesn't break existing clients and does give enough information in case someone sees the error (and that they speak English—which is why this isn't ideal). Also, if the number of possible errors is kept small, then explicitly checking each string isn't that much of an issue.

I can only hope other gopher servers pick up on the idea and make gopher space a little bit less annoying to use.


An annoying aspect of the gopher protocol

In the nearly two years of running a gopher server the most annoying aspect of the gopher protocol, in my opinion, is the inability to redirect client requests. It's painful to see the same request for /Phlog: over and over again due to an unwarranted assumption on how things are organized on my blog. As I stated on the top page of my gopher server:

Welcome to Conman Laboratories

NOTE: RFC-1436 says this about selectors:

… an OPAQUE selector string … The selector string should MEAN NOTHING to the client software; it should never be modified by the client.

(emphasis added)

The selectors on this server *ARE OPAQUE* and *MUST* be sent *AS IS* to the server. Please note that the selectors here rarely start with a '/' character. Particularly, phlog entries start with a selector of "Phlog:"—note the lack of '/' and the ending ':'.

Thank you.

  ­­ The Management

And yet—people assume I'm serving content up from a filesystem and therefore, a leading “/” is required.

Aaaaaaaaaaaaaaaaarg!

If only gopher had a way to redirect the request, but alas …

I mean, you can kind of work a way, but that leads to the second most annoying aspect of the gopher protocol, in my opinion—the document type is an inherent part of the request! The client is told beforehand the type of data it will be requesting, unlike an HTTP request where the server tells the client the type of data being sent. Redirecting a gopher “directory” is easy—just serve up a “directory” type with the correct link. And while not ideal, redirecting a text file could also be done by sending a text file with the updated URL, but this doesn't help with automated clients (as I found out the hard way). And this won't work at all for any other non-text media types like images.

I suppose you could overload the “gopher error type” which has to be checked for anyway (one hopes) but again, that won't help with automated agents. Unless perhaps if the “gopher error type” was standardized a bit more, but good luck with that (although I could try it) …

At least I got the webbots to stop making requests


Don't mind me, I'm just a gopher pretending to be a teapot

Almost two months ago I modified my gopher server to respond to HTTP requests with “418 I'm a teapot” and it appears to have worked! The gopher server is no longer receiving any HTTP requests.

I'm also glad that the movement to remove the 418 response code failed. I don't find it useless, as it was probably odd enough that the authors of the agents making the inappropriate requests were forced to look into response and just skip my server entirely.

So yea!


I finally decided to release my gopher server software

So when I originally wrote my gopher server back in February/March of 2017, it was a hack job to just more or less server up my blog over gopher. Everything was hard coded into the codebase and making changes was annoying. So earlier this month I decided to start over and make a gopher server that someone else could potentially use. Another goal was to keep the the site functioning as is.

The hardest part was naming the darned thing, and in the end, I decided upon the rather plain name of port70. I've been running it now for a few weeks and not only is it stable, but much easier to configure, modify and serve up content.

Obligatory Picture

[It's the most wonderful time of the year!]

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: http://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

http://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2019 by Sean Conner. All Rights Reserved.