The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Tuesday, October 01, 2019

It only took 35 years …

The first accurate portrayal of a black hole in Hollywood was in the 2014 movie “Interstellar” with help from theoretical physicist Kip Thorne, and the images from that movie do appear to match reality. But I find it facinating that astrophysicist Jean-Pierre Luminet generated an image of a black hole in April of 1979!

It's sad to think that Disney'sThe Black Hole,” which came out in December of 1979, could have not only been the first Hollywood portrayal of a black hole (which it appears it was), but it could have been an accurate portrayal of a black hole. Ah well …

Wednesday, October 02, 2019

“Night of the Lepus” was based on a book‽

I'm going to lunch with a few cow-orkers and ST is driving. While driving, we're subject to his music listening choices, which tend towards movie and video game scores. As a joke, I mention that he's playing the score to “Night of the Lepus” and to my total surprise, no one else in the vehicle had ever heard of the movie.

So of course I start reading off the plot synopsis from Wikipedia and I'm amazed to learn that it's based on a book! “Night of the Lepus” was originally a book! I then switch to reading the plot synopsis of The Year of the Angry Rabbit and … it sounds amazing! An attempt to eradicate rabbits in Australia leads to world peace through an inadvertant doomsday weapon with occasional outbreaks of killer rabbits.

Wow!

Why wasn't that movie made?

Friday, October 04, 2019

Back when I was a kid, all I had to worry about was the mass extinction of the human race due to global thermonuclear war

Bunny and I are out eating dinner at T. B. O. McFlynnagin's and out of the corner of my eye on one of the ubiquitious televisions dotting the place, I saw what appeared to be a “back to school” type commercial but one that turned … dark. I'm normally not one for trigger warnings, but this commercial, which did air because I saw it, is quite graphic. So … you have been warned!

It reminds me of the “Daisy” commercial, although it's hard to say which one is worse. Perhaps both of them are.


It's a stupid benchmark about compiling a million lines of code, what else did I expect?

I came across a claim that the V programming langauge can compile 1.2 million lines of code per second. Then I found out that the code was pretty just 1,200,000 calls to println('hello world'). Still, I was interested in seeing how GCC would fare. So I coded up this:

#include <stdio.h>

int main(void)
{
  printf("Hello world!\n");
  /* 1,199,998 more calls to printf() */
  printf("Hello world!\n");
  return 0;
}

which ends up being 33M, and …

[spc]lucy:/tmp>time gcc h.c
gcc: Internal error: Segmentation fault (program cc1)
Please submit a full bug report.
See <URL:http://bugzilla.redhat.com/bugzilla> for instructions.

real    14m36.527s
user    0m40.282s
sys     0m17.497s
[spc]lucy:/tmp>

Fourteen minutes for GCC to figure out I didn't have enough memory on the 32-bit system to compile it (and the resulting core file exceeded physical memory by three times). I then tried on a 64-bit system with a bit more memory, and I fared a bit better:

[spc]saltmine:/tmp>time gcc h.c

real    7m37.555s
user    2m3.000s
sys     1m23.353s
[spc]saltmine:/tmp>

This time I got a 12M executable in 7½ minutes, which seems a bit long to me for such a simple (but large) program. I mean, Lua was able to compile an 83M script in 6 minutes, on the same 32-bit system as above, and that was considered a bug!

But I used GCC, which does some optimizations by default. Perhaps if I try no optimization?

[spc]saltmine:/tmp>time gcc -O0 h.c

real    7m6.939s
user    2m2.972s
sys     1m27.237s
[spc]saltmine:/tmp>

Wow. A whole 30 seconds faster. Way to go, GCC! Woot!

Saturday, October 05, 2019

More stupid benchmarks about compiling a million lines of code

I'm looking at the code GCC produced for the 32-bit system (I cut down the number of lines of code):

 804836b:       68 ac 8e 04 08          push   0x8048eac
 8048370:       e8 2b ff ff ff          call   80482a0 <puts@plt>
 8048375:       68 ac 8e 04 08          push   0x8048eac
 804837a:       e8 21 ff ff ff          call   80482a0 <puts@plt>
 804837f:       68 ac 8e 04 08          push   0x8048eac
 8048384:       e8 17 ff ff ff          call   80482a0 <puts@plt>
 8048389:       68 ac 8e 04 08          push   0x8048eac
 804838e:       e8 0d ff ff ff          call   80482a0 <puts@plt>
 8048393:       68 ac 8e 04 08          push   0x8048eac
 8048398:       e8 03 ff ff ff          call   80482a0 <puts@plt>
 804839d:       68 ac 8e 04 08          push   0x8048eac
 80483a2:       e8 f9 fe ff ff          call   80482a0 <puts@plt>
 80483a7:       68 ac 8e 04 08          push   0x8048eac
 80483ac:       e8 ef fe ff ff          call   80482a0 <puts@plt>
 80483b1:       68 ac 8e 04 08          push   0x8048eac
 80483b6:       e8 e5 fe ff ff          call   80482a0 <puts@plt>
 80483bb:       83 c4 20                add    esp,0x20

My initial thought was Why doesn't GCC just push the address once? but then I remembered that in C, function parameters can be modified. But that lead me down a slight rabbit hole in seeing if printf() (with my particular version of GCC) even changes the parameters. It turns out that no, they don't change (your mileage may vary though). So with that in mind, I wrote the following assembly code:

        bits    32
        global  main
        extern  printf

        section .rodata
msg:
                db      'Hello, world!',10,0

        section .text
main:
                push    msg
                call    printf
	;; 1,999,998 more calls to printf
		call	printf
		pop	eax
		xor	eax,eax
		ret

Yes, I cheated a bit by not repeatedly pushing and popping the stack. But I was also interested in seeing how well nasm fares compiling 1.2 million lines of code. Not too badly, compared to GCC:

[spc]lucy:/tmp>time nasm -f elf32 -o pg.o pg.a

real    0m38.018s
user    0m37.821s
sys     0m0.199s
[spc]lucy:/tmp>

I don't even need to generate a 17M assembly file though, nasm can do the repetition for me:

        bits    32
        global  main
        extern  printf

        section .rodata

msg:            db      'Hello, world!',10,0

        section .text

main:           push    msg
        %rep 1200000
                call    printf
        %endrep

                pop     eax
                xor     eax,eax
                ret

It can skip reading 16,799,971 bytes and assemble the entire thing in 25 seconds:

[spc]lucy:/tmp>time nasm -f elf32 -o pf.o pf.a

real    0m24.830s
user    0m24.677s
sys     0m0.144s
[spc]lucy:/tmp>

Nice. But then I was curious about Lua. So I generated 1.2 million lines of Lua:

print("Hello, world!")
-- 1,999,998 more calls to print()
print("hello, world!")

And timed out long it took Lua to load (but not run) the 1.2 million lines of code:

[spc]lucy:/tmp>time lua zz.lua
function: 0x9c36838

real    0m1.666s
user    0m1.614s
sys     0m0.053s
[spc]lucy:/tmp>

Sweet!

Monday, October 07, 2019

I was working harder, not smarter

Another department at the Ft. Lauderdale Office of the Corporation is refactoring their code. Normally this wouldn't affect other groups, but this particular code requires some executables we produce, and to make it easier to install, we, or rather, I, needed to create a new repository with just these executables.

Easier said than done.

There's about a dozen small utilities, each a single C file, but unfortunately, to get the banana (the single C file) you also need the 800 pound gorilla (its dependencies). Also, these exectuables are spread through most of our projects—there's a few for “Project: Wolowizard” (which is also used for “Project: Sippy-Cup”), multiple ones for “Project: Lumbergh,” a few for “Project: Cleese” and … oh, I never even talked about this other project, so let's just call it “Project: Clean-Socks.”

Uhg.

So that's how I spent my time last week, working on “Project: Seymore,” rewriting a dozen small utilities to remove the 800 pounds of gorilla normally required to compile these tools. All these utilties do is transform data from format A to format B. The critical ones take a text file of lines usually in the form of “A = B” but there was one that took over a day to complete because of the input format:

A = B:foo,bar,... name1="value" name2="value" ...
A = B:none

Oh, writing parsing code in C is so much fun! And as I was stuck writing this, I kept thinking just how much easier this would be with LPEG. But alas, I wanted to keep the dependencies to a minimum, so it was just grind, grind, grind until it was done.

Then today, I found that I had installed peg/leg, the recursive-descent parser generator for C, on my work machine eight years ago.

Eight years ago!

Head, meet desk.

Including the time to upgrade peg/leg, the time it took me to rewrite the utility that took me nearly two days only took two hours (most of the code among most of the utilities is the same—check options, open files, sort the data, remove duplicates, write the data; it's only the portion that reads and converts the data that differs). It's also shorter, and I think easier to modify.

So memo to self: before diving into a project, check to see if I already have the right tools installed.

Sigh.


Tool selection

So if I needed to parse data in C, why did I not use lex? It's pretty much standard on all Unix systems, right? Yes, but all it does is lexical analysis. The job of parsing requires the use of yacc. So why didn't I use yacc? Beacuse it doesn't do lexical analysis. If I use lex, I also need to use yacc. Why use two tools when one will suffice? They are also both a pain to use, so it's not like I immediately think to use them (that, and the last time I used lex in anger was over twenty years ago …)

Sunday, October 13, 2019

How many redirects does your browser follow?

An observation on the Gemini mailing list led me down a very small rabbit hole. I recalled at one time that a web browser was only supposed to follow five consecutive redirects, and sure enough, in RFC-2068:

10.3 Redirection 3xx

This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD. A user agent SHOULD NOT automatically redirect a request more than 5 times, since such redirections usually indicate an infinite loop.

Hypertext Transfer Protocol -- HTTP/1.1

But that's an old standard from 1997. In fact, the next revision, RFC-2616, updated this section:

10.3 Redirection 3xx

This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD. A client SHOULD detect infinite redirection loops, since such loops generate network traffic for each redirection.

Note: previous versions of this specification recommended a maximum of five redirections. Content developers should be aware that there might be clients that implement such a fixed limitation.

Hypertext Transfer Protocol -- HTTP/1.1

And subsequent updates have kept that language. So it appears that clients SHOULD NOT (using language from RFC-2119) limit itself to just five times, but still SHOULD detect loops. It seems like this was changed due to market pressure from various companies and I think the practical limit has gone up over the years.

I know the browser I use, Firefox, is highly configurable and decided to see if its configuration included a way to limit redirections. And lo', it does! The option network.http.redirection-limit exists, and the current default value is “20”. I'm curious to see what happens if I set that to “5”. I wonder how many sites will break?

Obligatory Picture

[It's the most wonderful time of the year!]

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: http://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

http://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2019 by Sean Conner. All Rights Reserved.