The Boston Diaries

Tuesday, January 01, 2019

The upside is that this is not an election year—the downside is that we only have 364 days until it is

I've been looking over the past decade's worth of New Year's Day entries so I don't inadvertantly repeat myself, and boy, do I bitch about the fireworks. Thankfully this holiday season has been a bit low key, and that includes our neighbor's propensity towards blowing things up. Yes, there were fireworks tonight, but not nearly at the war sounding levels of previous years.

I'll take solace when I can when it comes to fireworks.

Hopefully, this means that this year will be low key. All I can say is thank God it isn't an election year! Only 364 more days until that madness starts.

Anyway …

HAPPY NEW YEAR!

Yes, we have no copyright

I just saw a commerical using the Prince song “Let's Go Crazy”. It was something I wasn't expecting because Prince had refused all requests to use his work for commericals (as well as turning down all requests from Weird Al Yankovic to parody his songs). But given that Prince died back in 2016 it seems his estate has waited long enough and is now enjoying the licensing fees.

Then it hit me—it won't be until 2086 that the works of Prince will fall into the public domain. Nearly a hundred years since some of his most iconic hits.

In other copyright-public-domain news, today is the first day in 21 years that works have fallen into the public domain. It's weird to think that up until yesterday, “Yes, We Have No Bananas” was still in copyright.

Monday, January 07, 2019

Ignorance is Bliss

So I'm catching up on The Transylvania Times (I'm a bit behind, and they're piling up) when I come across this rather distrubing headline: “Wednesday Morning Earthquake Felt In Transylvania.”

Wait … what?

Yes Virgina, an earthquake in the southeast United States.

I know eathquakes happen along the Pacific Coast (like California, the land of Shake and Bake). I also know they happened in Missouri (although rare, when it happens, it happens). But in the East? The East is supposed to be stable. Rock solid (ahem). Not shifting underneath our very feet. I am disquieted by this news.

As I fall deeper into this whole “East Coast Earthquake Zone,” it appears to be all too true. There's a fault line that runs from Alabama northeast to Newfoundland, Canada, and it runs about six miles east of Brevard.

I … I don't know how I feel about this. I admit, I have an irrational fear of earthquakes. I don't know why, I just do. Hurricanes? Please … don't bother me unless it's a category 4. An earthquake? Even a relatively minor 2 on the Richter scale (and this one was a 4.4)? Aieeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee! Run away!

And the kicker in all this? This fault line has a name. It's name is the Brevard Fault! No, really.

Now I really have no idea how I feel about this.

Tuesday, January 08, 2019

A reason to celebrate today

The book Atlas Obscura: An Explorer's Guide to the World's Hidden Wonders appeared on the doorstep, courtesy of Bunny. I do have to wonder why she felt it necessary to give me that book to celebrate the only time in the history of the United States the country was not in debt, back in 1835 thanks to President Andrew “I regret not killing that bastard Vice President Calhoun” Jackson. It could be for other reasons, but I can't be sure.

In any case, it's a very cool book. Where else could one learn about the Skunk Ape Research Headquarters in Ochopee, Florida? Or the Lost Subway of Cincinnati, Ohio? Lots of places to visit—oh!

Saturday, January 12, 2019

It's no longer possible to write a web browser from scratch, but it is possible to write a gopher browser from scratch

As I mentioned two months ago, I've been browsing gopherspace. At the time, I was using an extension to Firefox to browse gopherspace, but a recent upgrade to Firefox left it non-working. I could use Lynx but it includes unnecessary pauses that make it feel slower than it really should be. I also don't care for how it lays out the page.

So I've been writing my own CLI gopher client.

And it's not like the protocol is all that difficult to handle and everything is plain text.

How hard could it be to download a bunch of text files and display them?

The protocol? Trivial.

Displaying the pages? Er … not so trivial.

The first major problem—dealing with UTF-8. Problem—the terminal window can only display so many characters per line (default of 80 usually). There are two ways of dealing with those—one is to wrap the text to the following lines(s), and the other is to “pan-and-scan”—let the text disappear off the screen and pan left-and-right to show the longer lines. Each method requires chopping text to fit though. With ASCII, this is trivial—if the width of the temrinal is N columns wide, just chop the line at every N-bytes. This works because each character in ASCII is one byte in size. But characters in UTF-8 take a variable number of bytes, so chopping at arbitrary byte boundaries is less than optimum.

The solution may look simple:

-- ************************************************************************
-- usage:       writeslice(left,right,s)
-- descr:       Write a portion of a UTF-8 encoded string
-- input:       left (integer) starting column (left edge) of display screen
--              right (integer) ending column (right edge) of display screen
--              s (string) string to display
-- ************************************************************************

local function writeslice(left,right,s)
  local l = utf8.offset(s,left)      or #s + 1
  local r = utf8.offset(s,right + 1) or #s + 1
  tty.write(s:sub(l,r - 1))
end

but simple is never easy to achieve. It took a rather surprising amount of time to come up with that solution.

The other major problem was dealing with the gopher index files. Yes, they are easy to parse (once you wrap your head around some of the crap that presents itself as a “gopher index” file) but displaying it was an even harder problem.

Upon loading a gopher index, I wanted the first link to be highlighted, and use the Up and Down keys to select the link and then the enter key to select a page to download and view. Okay, but not all lines in a gopher index are actual links. In fact, there are gopher index files that have no actual links (that surprised me!). And how do I deal with lines longer than can be displayed? Wrap the text? Let the text run off the screen?

At first, I wanted to wrap long lines, but then trying to manage highlighting a link that spans several lines when it might not all be visible on the screen (the following lines might be off the bottom, for instance) just proved too troublesome to deal with. I finally just decided to let long lines of text run off the end of the screen just to make it easier to highlight the “current selection.” Also, most gopher index pages I've come across in the wild generally contain short lines, so it's not that much of a real issue (and I can “pan-and-scan” such a page anyway).

For non-text related files, I farm that out to other programs via the mailcap facility found on Unix systems. That was an interesting challenge I will probably address at some point.

There are still a few issues I need to address, but what I do have works. And even though it's written in Lua it's fast. More important, I have features that make sense for me and I don't have to slog through some other codebase trying to add an esoteric feature.

And frankly, I find it fun.

The technical differences between HTTP and gopher

… The point is to attempt as full a sketch as possible of the actual differences and similarities between the HTTP and GOPHER protocols.

…

From what I gather, these are the similaries:

Both gopher and http start with a TCP connection on an IANA registerd port number.

Both servers wait for text (the request) terminating in a CRLF

Both servers expect the request (if there is one) to be formatted in a particular way.

Both servers return plain text in response, and close the TCP connection.

And these are the differences that I understand:

Gopher will accept and respond to a blank request, with a default set of information, http will not.

Gophper [sic] sends a single "." on a line by itself to tell the client it is done, http does nothing similar prior to closing the connection.

Http has things like frames, multiplexing, compression, and security; gopher does not.

Http has rich, well-developed semantics, gopher has basic, minimalist semantics

Http requests are more resource intensive than gopher requests.

Http is highly commercialized, gopher is barely commercialized.

Http is heavily used and highly targeted by malicious users, gopher is neither.

Http is largely public, gopher is largely private (de facto privacy through obscurity.)

Http is used by everyone, their children, their pets, their appliances, their phones, and their wristwatches; gopher is used primarily by technical folk and other patient people.

Http all but guarantees a loss of privacy; gopher doesn't

Yeah, I know, it's not much, but that's all that is coming to mind presently. What are your thoughts?

Tech nology/Gopher (I'm quoting for the benefit of those that cannot view gopher based sites).

I don't want to say that tfurrows is wrong, but there is quite a bit that needs some clarification, and as someone who has worked with HTTP for over twenty years, and has recently dived back into gopher (I used it for several years in the early 90s—in fact, I recall Time Magazine having a gopher server back then) I think I can answer this.

First, the protocol. The gopher protcol is simple—you make a TCP connection to the given port (defaults to 70). Upon connection, the client then sends the request which can be one of three formats:

CRLF

The simplest request—just a carriage return and line feed character. This will return the main page for the gopher server.

selector-to-viewCRLF

This will return the requested data from the gopher server. The specification calls this a “selector.” And yes, it can contain any non-control character, including space. It's terminated by a carriage return and line feed characters.

selector-for-searchHTsearch terms to useCRLF

The last one—this sends a search query to a gopher server. It's the “selector” that initiates a search, followed by a horizontal tab character, then the text making up the query, followed by a carriage return and line feed.

In all three cases, the gopher server will immedately start serving up the data. Text files and gopher indexes will usually end with a period on its own line; other file transfers will end with the server closing the connection.

That's pretty much the gopher protocol.

The HTTP protocol that works the closest to gopher is the so called HTTP/0.9 version, and it was pretty much the the same. So the same three requests above as HTTP requests.

GET /CRLF

The minimum request for HTTP. As you can see, it's only an extra four characters, but the initial text, GET in this case, was useful later when the types of requests increased (but I'm getting ahead of myself here). This will return the main page for the HTTP server.

GET /resource_to_viewCRLF

The usual request, but instead of a “selector” you request a “resource” (different name, same concept) but it cannot contain bare spaces—they have to be encoded as %20 (and a bare “%” sign is encoded as %25). Like gopher, the contents are immediately sent, but there is no special “end-of-file” marker—the server will just close the connection.

GET /resource_for_seach?search%20terms%20to%20useCRLF

And a search query, where you can see the spaces being replaced with %20. Also note that the search query is separated by the “resource” with a “?”.

So not much difference between gopher and HTTP/0.9. In fact, during the early to mid-90s, you could get gopher servers that responded to HTTP/0.9 style requests as the difference between the two was easy to distinguish.

The next version of HTTP, HTTP/1.0, expanded the protocol. Now, the client was expected to send a bit more infomration in the form of headers after the request line. And in order to help distinguish between HTTP/0.9 and HTTP/1.0, the request line was slightly expanded. So now the request would look like:

GET /resource_to_view HTTP/1.0CRLF
User-Agent: Foobar/1.0 (could be a web browser, could be a web crawler)CRLF
Accept: text/*, image/*CRLF
Accept-Language: en-US;q=1.0, en;q=0.7; de;q=0.2, se;q=0.1CRLF
Referer: http://www.example.net/search?for%20blahCRLF
CRLF

(Yes, “Referer” is the proper name of that header, and yes, it's mispelled)

I won't go too much into the protocol here, but note that the client can now send a bunch more information about the request. The Accept header now allows for so-called “content negotiation” where the client informs the server about what type of data it can deal with; the Accept Language header tells the server the preferred languages (the example above says I can deal with German, but only if English isn't available, but if English is availble, American is preferred). There are other headers; check the specification for details).

The server now returns more information as well:

HTTP/1.0 200 OkayCRLF
Date: Sun, 12 Jan 2019 13:39:07 GMTCRLF
Server: Barfoo/1.0 (on some operating system, on some computer, somewhere)CRLF
Last-Modified: Tue, 05 Sep 2017 02:59:41 GMTCRLF
Content-Type: text/html; charset=UTF-8CRLF
Content-Length: 3351CRLF
CRLF
content for another 3,351 bytes

The first line is the status, and it informs the client if the “resource” exists (in this case, a 200 indicates that it does), or if it can't be found (the dreaded 404) or if it has explicitely been remove (410) or it's been censored due to laws (451), or even moved elsewhere.

Also added were a few more commands in addition to GET, like POST (which is used to send data from the client to the server) and HEAD (which is like GET but doesn't return any content—this can be used to see if a resource has changed).

HTTP/1.1 is just more of the same, only now you can make multiple requests per connection, a few more commands were added, and the ability to request portions of a file (say, to resume a download that was cut off for some reason).

HTTP/2.0 changes the protocol from text-based to binary (and attempts to do TCP- over-TCP but that's a rant for another time) but again, it's not much different, conceptually, than HTTP/1.1.

Security, as in https: type of security, isn't inherently part of HTTP. TLS is basically inserted between the TCP and HTTP layers. So the same could be done for gopher—just insert TLS between TCP and gopher and there you go—gophers:. Of course, that now means dealing with CAs and certificates and revocation lists and all that crap, but it's largely orthogonal to the protocols themselves.

HTTP/1.0 allows compression but that falls out of the content negotiation. The bit about frames and multiplexing is more an HTTP/2.0 issue which is a lot of crap that the server has to handle instead of the operating system (must not rant …).

Are HTTP requests more resource intensive? They can be, but they don't have to be. But that leads right into the commericalization of HTTP. Or rather, the web. HTTP is the conduit. And conduits can carry both water and waste. HTTP became commercialized because it became popular. Why did HTTP become popular and gopher whithered? Personally, I think it has to do with HTML. Once you could inline images inside an HTML document, it was all over for gopher. The ability to include cat pictures killed gopher.

But in an alternative universe, where HTML had no image support, I think you would have seen gopher expand much like HTTP has. Work was started in 1993 to to expand the gopher protocol (alternative link) where the protocol gets a bit more complex and HTTP- like. As mentioned, a secure gophers: is “easy” to add ~~in that it doesn't change the core protocol~~ (update—it's not as easy as I thought). And as such, I could see it getting more commercialized. Advertising can be inserted

TYPEWRITERS

For SALE, HIRE, or EXCHANGE,

at HALF the USUAL PRICES.

MS. Typewritten from
10d. per 1,000 words. 100 Circulars for 4s

TAYLOR'S,
74, Chancery Lane, London.
(Est. 1884.)
Telegrams: "Glossator," London.
Telephone No. 690, Holborn.

even in a text file. Yes, it might look a bit strange, but it can be done. The only reason it hasn't is that gopher lost out to HTTP.

So those are the differences between HTTP and gopher. HTTP is more flexible but more complex to implement. Had history played out differently, perhaps gopher would have become more flexible and complex.

Who knows?

Monday, January 21, 2019

Oblivious spammers looking to game Google for “organic” links

Two weeks ago I received an email from A informing me of a broken link on a 14-year old post and suggesting I replace said broken link with a link to a general purpose site that has no relation to the post or to the specific information I originally linked to with the now broken link. It was obvious to me that A just searched for links to the defunct site and spammed anyone that had linked to said site in order to divert some Google Page Rank to some obscure site they've been paid to hawk. Then a week later, A emails me again to remind me of the email of the previous week. What sent me over the edge on this one was the following at the bottom of the second message: “P.S. If you don't want to hear from me anymore you can unsubscribe here.”

I'm sorry, but when did I “subscribe” to your emails? I replied in a rather harsh tone, but I suspect A just sent my reply to the bit bucket.

Then two days after that, I received an email from C informing me of a broken link on the same 14-year old post. C had a different email address from A, and appeared to work for a different company than A, but yet offered a different link than the one A had offered. And the link showed that C had no clue how the original broken link was used in context of the page and was just hawking some obscure site to obtain some Google Page Rank. So I ignored it.

Today, I received an email from C, reminding me of the email sent previously.

So I replied with a very harsh message informing C that not only was I aware of his previous email, but I was also aware of the previous two emails from A and the links A was spamming. I ended my reply to C with “I've decided to remove the damn link to XXXXXXXXXXX entirely! It's obvious both of you never read the page it was on and are just looking to game Google with ‘organic’ links.”

C surprised me with a reply a few hours later, apologizing for sending the emails.

Wow! Complaining actually worked. At least, I hope it worked and I was removed from at least one spam list. A guy can hope.

Tuesday, January 22, 2019

The Process über alles

It took a month and a half but my “self-review from hell” was rejected (and eight hours of my time wasted). This did not surprise me. But The Process is important because it's The Process, and thus, I will find myself spending another eight hours appeasing The Corporate Overlords Of The Corporation.

“One cannot simply walk into Mordor”

A manager went to the master programmer and showed him the requirements document for a new application. The manager asked the master: “How long will it take to design this system if I assign five programmers to it?”

“It will take one year,” said the master promptly.

“But we need this system immedately or even sooner! How long will it take if I assign ten programmers to it?”

The master programmer frowned. “In that case, it will take two years.”

“And if I assign a hundred programmers to it?”

The master programmer shrugged. “Then the design will never be completed,” he said.

The Tao of Programming

They always shoot the messengers, don't they?

Thus spake the master programmer:

“Let the programmers be many and the managers few—then all will be productive.”

The Tao of Programming

But incentivize finding bugs, and programmers will be buying minivans

A group of programmers were presenting a report to the Emperor. “What was the greatest achievement of the year?” the Emperor asked.

The programmers spoke among themselves and then replied, “We fixed 50% more bugs this year than we fixed last year.”

The Emperor looked on them in confusion. It was clear that he did not know what a “bug” was. After conferring in low undertones with his chief minister, he turned to the programmers, his face red with anger. “You are guilty of poor quality control. Next year there will be no ‘bugs’!” he demanded.

And sure enough, when the programmers presented their report to the Empoeror the next year, there was no mention of bugs.

The Zen of Programming

Wednesday, January 23, 2019

Gold Handcuffs

It was not a fun day at The Ft. Lauderdale Office Of The Corporation. There was much I wanted to say and do in response to The Process, but I was disuaded from all of them by several persons, all of whom informed me that doing so was not in my best self-interest. It's all the more frustrating because they're right!

You have no idea how much I self-censored myself when writing this post. I've already written and deleted over a dozen revisions of this post, and I hate that I've had to do that. I may have the same global reach as a corporation, but I don't have the same amount of money to fight.

Thursday, January 24, 2019

The various principles of management

Today was much nicer than yesterday. As I did actual work, you know, the work that I was hired to perform—to generate software that we charge our customers to use, I reached the que sera sera state with my “self-review” and have decided to let it go (I can only hope that the Corporate Overlords of the Corporation will finally accept it and not kick it back down for another wasted day of make-busy work).

But The Process has me thinking. There's the Peter Principle, which states that people are promoted to their level of incompetence, and the Dilbert Principle, which states that incompetent people are promoted to management to limit the damage they can do. And then there's the Gervais Principle, which is a lot harder to summarize but at first glance (of over 30,000 words) appears to explain management machinations, but I suspect I'm going to have to read the thing several times before I understand the actual principle, especially given the terms used to describe three groups of people—sociopaths, clueless, and losers—are loaded with negative connotations (after an initial reading, I would personally use the terms realists, idealists and cynics).

In the mean time, as far as I'm concerned, The Process is over, and I shall not talk about it again.