The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Friday, April 22, 2022

Notes on some extreme lawn ornaments, Brevard edition

Eight years ago (wow! Has it been that long? [Yes. —Editor] [Who asked you? —Sean]) while in Brevard, I took a picture of some extreme lawn ornaments—life sized plastic cows. I wrote the “eat moar chikin” image caption (if you hold your mouse over the image, it should pop up) because the cows reminded me of the cows used by Chick-fil-a.

I'm reading the Transylvania Times when I come across the article “Transylvanian of the Week: John Taylor.” He owns O.P. Taylor's, a well known toy store in the area, and he's the one with the life sized plastic cows in his front yard. Not only that, but he purchased them from the person who made them for Chick-fil-a. Little did I know that my caption was more correct than I thought.


Play stupid games, win stupid prizes

It's not only Gemini bots having issues with redirects. I'm poking around the logs from my webserver, when I scan all of them to see the breakdown of response codes my server is sending (for this month). And well … it's rather surprising:

Breakdown of HTTP response codes from all the sites I host
Status Meaning Count
Status Meaning Count
302 Found (moved temporarily) 253773
200 OK 178414
304 Not Modified 25552
404 Not Found 8214
301 Moved Permanently 6358
405 Method Not Allowed 1453
410 Gone 685
400 Bad Request 255
206 Partial Content 151
401 Unauthorized 48
500 Internal Server Error 24
403 Forbidden 4

I was not expecting that many temporary redirects. Was it some massive issue across all the sites? Or just a few? Well, it turned all of the temporary redirects were from one site: http://www.flummux.org/ (and no, I'm not linking to it as the reason why will become clear). I registered the domain way back in 2000 just as a place to play around with web stuff or to temporarly make files available without cluttering up my main websites. The site isn't meant to be at all serious.

Scanning the log file manually, I was seeing endless log entries like:

XXXXX­XXXXX­XXXXX - - [10/Apr/2022:20:55:05 -0400] "GET / HTTP/1.0" 302 284 "http://flummux.org/" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; MRA 4.6 (build 01425); .NET CLR 1.0.3705; .NET CLR 2.0.50727)" -/- (-%)

That log entry indicates a “browser” from IP address XXXXX­XXXXX­XXXXX, identifying itself as “Mozilla (yada yada)” on the 10th of April, attempted to get the main page, as referred by http://flummux.org/. And for how many times this happened, broken down by browser:

Top five user agents making the troublesome requests
Count User agent
127100 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; MRA 4.6 (build 01425); .NET CLR 1.0.3705; .NET CLR 2.0.50727)
126495 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; InfoPath.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET4.0C; .NET4.0E)
42 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36
36 CATExplorador/1.0beta (sistemes at domini dot cat; https://domini.cat/catexplorador/)
15 Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:94.0) Gecko/20100101 Firefox/94.0

Ah, two “browsers” that don't limit the number of redirects they follow. And amusingly enough, both agents came from the same IP address. Or maybe it's the same agent, just lying about what it is. Who knows? Well, aside from the author(s) of said “browser.”

But what was all horribly confusing to me why the server was issuing a temporary redirect. Yes, if you try to go to http://flummux.org/ the server will repond with a permanent redirect (status 301) to http://www.flummux.org/ (the reasons for that is to canonicalize the URLs and avoid the “duplicate content penalty” from Google—I set this all up years ago). But the site shouldn't redirect again. I can bring the site up in my browser without issue (which is a visual … pun? Commentary? Joke? on the line “The sky above the port was the color of television, tuned to a dead channel.”).

And then I remembered—back in 2016, I set things up such that if the browser sent in a referring link, the page would temporarily redirect back to the referring link (which is why I'm not linking to it—you would just be redirected right back to this page). I set that up on a lark for some reason that now esacapes me. So the above “browsers” kept bouncing back and forth between flummux.org and www.flummux.org. For a quarter of a million requests.

Sigh.

In other news, bugs are nothing more than an inattention to detail.


I have now wrapped my brain around how it got that link

Martin Chang replied to my post about Gemini crawlers, saying that it was his crawler that had sent links like gemini://gemini.conman.org/boston/2008/04/30/2008/04/30.1 and decided to look into the issue. Well, he did, and he found it wasn't his issue, but mine.

Oh my.

Okay, so how did I end up generating links like gemini://gemini.conman.org/boston/2008/04/30/2008/04/30.1?

This is, first and foremost, a blog on the web. Each entry is stored as HTML, and when a request is made via gopher or Gemini, the entries making up the request are retrieved and converted to the appropriate format. As part of that conversion, links to the blog itself have to be translated appropriately, and that's where the error happened.

So, for example, the links for the above entry are collected:

  1. http://www.cisco.com/
  2. http://it.slashdot.org/article.pl?sid=08/04/29/2254242
  3. http://www.arin.net/
  4. 2008/04/30.1#fn-2008-04-30-1-1
  5. http://www.barracudanetworks.com/
  6. http://answers.yahoo.com/question/index?qid=20080219010714AAnF91Q

Those links with a URL scheme are passed through as is, but #4 is special, not only is it a relative link to my blog, but it also contains a URL fragment, and that's where things went pear-shaped. The code to do the URL translations parsed each link as a URL, but for relative links, I used the string, not the parsed URL structure. As such, the code didn't work so well with URL fragments, and thus, I ended up with links like gemini://gemini.conman.org/boston/2008/04/30/2008/04/30.1 (for the record, the same bug was in the gopher translation code as well).

The fix, as for most bugs, was easy once the core issue was identified. The other issues I talked about are, as far as I can tell, not stuff I can fix.

Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.