The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Thursday, Debtember 01, 2022

Minimum support for webmentions

I just now realized I've released a version of mod_blog during the holiday season going back as far as 2016. With that in mind, and with the fact that I finally received my first webmention on my blog couple of days ago, I have just released the latest version for this Christmas season. The big change this release is that I now show webmentions per post, even though I've only so far received one.

Hey, it's a start with the webmentions.

You can also see from the sidebar list I have, that I changed versioning schemes a few years back. I used to use semantic versioning but upon reflection, I didn't feel it's not really fit for applications and instead switched to a monotonic version number. While the code has changed dramatically over the past 23 years (come this Debtember 4th) the data format has not changed one bit. It's still the “one HTML file per entry, using the file system as database” scheme, which has worked quite well for me over the years.


Discussions about this entry

Friday, Debtember 02, 2022

You too, can make objectively the world's best pizza at home

I have a thing for Detroit style pizza from Buddy's. If it wasn't so expensive to ship from Detroit, I would definitely have it more often. So it was with great joy that a few weeks ago Adam Ragusea release a video about Detroit style pizza. I had even more joy when I saw him make it from scratch. It's simple, it's just dough (which you have to make because it's not your standard pizza dough), pepperoni, Wisconsin brick cheese (which looks like it's only available via the Intarwebs if you aren't in Wisconsin) tomato sauce, and several hours (to let the dough proof, and to heat the oven to its highest setting, which technically isn't hot enough, but it will do).

Easy. [Yeah, and if you want to spend the money on the ingredients and a full day to make it, be my guest. I won't be doing it. —Bunny]


The vultures that are private equity firms

Adam Conover's video on private equity firms was interesting, but I would have liked a better explanation of how they make money from bankrupting the firms they buy (aside from the fees they apparently charge for their “services”). I would think that would be rather counter-productive over the longer term.

And yes, I have some experience with private equity firms. When I worked at The Corporation, we were initially bought out by a larger company (but were left along for years for … um … reasons), then that company was bought out by a private equity firm. It was then when we sold off access to some critical databases we used to a competitor and leased the data back from them, which I'm sure this bought in a ton of money for the private equity firm itself directly. Indirectly, it most likely shifted expenses around for tax advantages for the next few years (like shifting capital expenses to operating expenses or something like that—I'm not an accountant though) until our contract with our competitor expired in a few years and it would become Somebody Else's Problem to deal with (I think the hope of the private equity firm was that they would no longer own us by then). We also suffered hiring freezes because we “never had enough money to hire anyone” (odd, that, because we made millions per month from our customer, the Oligarchic Cell Phone Company).

Eventually, we did become Someone Else's Problem when we were sold to a much larger firm (which I don't think had any influence on the large push for Enterprise Agile—that's entirely the fault of the original company that bought out the Corporation), so at the very least, we avoided the “bankruptcy outcome.” But I can't say it was a pleasant experience at the time though.

Sunday, Debtember 04, 2022

Late to the party

I've been blogging for 23 years as of today. This is also the first day this blog is being served up via https:. All I had to do was just install the latest version of Apache on my server.

It took several days, but I got the latest version of Apache compiled and installed on my server. Yes, I did it the hard way. What better way of knowing how things work than doing it the hard way. I then spent Saturday updating the configuration. There were a few changes, like NameVirtualHost being deprecated, and having to add “Protocols h2 h2c http/1.1” and “Require all granted”.

Once that was done and the new server was up and running, then I dove into the whole “Encrypt All The Things!” rabbit hole (I know, I know, 2015 called and said I was late to the party). A recent post of mine made it to The Orange Site and fully half of the comments were about the disturbing lack of faith TLS I had. Of course. Fortunately, Apache has a module to handle certificates from Let's Encrypt (or others places that support the “certificate update dance” protocol). Unfortunately, there are subtleties not mentioned in the documentation. Like the MDCACertificateFile directive (which I need for my setup—don't ask) not being documented. Or the fact that if you make any type of mistake (like using the wrong domain name because you cut-n-paste the configuration from one host into another and forgot to make the domain name change, or using “SSLEngine on” in the wrong place, or forgetting to add acme-tls/1 to the Protocols directive) everything goes pear shaped and Let's Encrypt will rate limit and … ugh. I'm just lucky I have a few domains to practice on before enabling it for my main sites.

But I was able to finish in time for the 23rd anniversary of my blog and get that stupid little lock on my site.

You're welcome.

Monday, Debtember 05, 2022

“Until I'm rescued from this Chinese fortune cookie factory, I might as well make the best of it!”

I appear to have ths knack for getting odd Chinese fortune cookie fortunes. I've yet to get the “Help! I'm trapped in a Chinese fortune cookie factory!” but this is darned close:

[Fortune cookie says: “Come back later… I am sleeping.  (yes, cookies need their sleep, too)”] Philip, we need to talk about your sleeping on the job here at the Fortune Cookie Factory … ”

At least this time it wasn't in French.

Wednesday, Debtember 07, 2022

Notes on an overheard conversation about locking the keys in the car

“Finally! I'm home!”

“Yes you are!”

“And you didn't answer your phone.”

“You didn't call!”

“Yes I did.”

“Oh! I see I did receive a call, but it was from a number not on my contact list. You know I don't answer those.”

“I was hoping you'd make an exception.”

“It's hard to make an exception when I don't know who is calling.”

“Sigh. I locked my keys in my car, and I had to walk home from Panera Bread.”

“Oh dear … ”

“Now I have to locate my spare key.”

“Oh, you mean this key.”

“Yes. Could you please drive me to my car now?”


Notes on configuring Apache mod_md

I've been tweaking my Apache configuration for the past two days, trying to figure out what I need and don't need, and these are just some notes I've collected on the process. I'm using mod_md for managing the secure certificates, and there isn't much out on the Intarwebs about how a configuratin for a website should look like. I can find plenty of pages that basically regurgitates the Apache documentation for mod_md, but nothing on how it all goes together. So here's an annotated version of a configuration for one of my less important sites:

<MDomainSet www.flummux.org>
	MDCertificateAgreement	accepted
	MDContactEmail		sean@conman.org
	MDMember		flummux.org
	MDRequireHttps		temporary
</MDomainSet>

The required stuff. I've found that using MDomainSet is much cleaner than MDomain as I have multiple sites that I want to keep separated, certificate wise. I'm old-school when it comes to naming, so I like using the “www” prefix and prefer that to be part of the canonical name for my domains. I also support the plain domain name, but only to redirect to the “www” version of the site. If you are more hipster than I, then just reverse the domain names. I won't judge.

Given the push that “Encrypt All The Things!” has had, especially from Google, I'm expecting any month now for Google Chrome (that has, what? An 85% usage rate on the Internet?) to enable the Big Scary Error Messages on non-encrypted web requests, so I might as well go ahead and start pushing the secure versions of my sites (sigh—I really hate this bit, but I think I'm in the minority on this), thus the MDRequireHttps setting. I tried using permanent on one of my test domains and I screwed myself over when I flubbed the mod_md configuration—I can't even reach the site from my primary browser as it is now stuck for the next six months trying to reach the secure version which isn't running. Yes, I could fix this by cleaning out my cache, but that's pretty much an “all-or-nothing” option, and for a domain I almost never use, I can live with that for now. I also flubbed the configuration for that domain so bad, that I have to wait for a month before I try obtaining a certificate again.

Sigh.

<VirtualHost 71.19.142.20:80>
	ServerName	flummux.org
	Redirect	permanent	/	http://www.flummux.org/
	Protocols	h2 h2c http/1.1 acme-tls/1
</VirtualHost>

<VirtualHost 71.19.142.20:80>
	ServerName	www.flummux.org
	Protocols	h2 h2c http/1.1 acme-tls/1
</VirtualHost>

Because I'm doing the MDRequireHttps directive, I've found that this is all I need for the non-secure settings, which also means I don't need to duplicate the actual server settings twice, once for the non-secure version, and again for the secure version. The first block is there to redirect http://domain requests to http://www.domain requests. I'm not redirecting directly to https: here, as the Apache documentation warns that the certificate renewal might now work. And because I want the certificate renewal to work, I added acme-tls/1 to the list of protocols supported.

<VirtualHost 71.19.142.20:443>
	SSLEngine	On
	ServerName	flummux.org
	Redirect	permanent	/	https://www.flummux.org/
	Protocols	h2 h2c http/1.1 acme-tls/1
</VirtualHost>

This is just to redirect https://domain requests to https://www.domain requests. I'm not sure if I really need the acme-tls/1 setting here, but I'm not taking a chance with the certificate renewal. It's not clear in the Apache documentation what would happen, and given how long I have to wait if it messes up, I'm not willing to test it.

<VirtualHost 71.19.142.20:443>
  SSLEngine		on
  ServerName		www.flummux.org
  ServerAdmin		sean@conman.org
  DocumentRoot		/home/spc/web/sites/www.flummux.org/htdocs
  AddHandler		server-parsed .shtml
  AddOutputFilter	INCLUDES .shtml
  AddOutputFilterByType	DEFLATE	text/html text/plain text/xml
  Protocols		h2 h2c http/1.1 acme-tls/1
  CustomLog		/home/spc/web/logs/www.flummux.org combined-deflate
  FileETag		MTime Size
  AddDefaultCharset	UTF-8
  DirectoryIndex	index.cgi

  SetEnv LUA_PATH	"/home/spc/web/sites/www.flummux.org/lua/?.lua"
  SetEnv LUA_CPATH	"/home/spc/web/sites/www.flummux.org/lib/?.so"
  Header set Content-Security-Policy "style-src 'unsafe-inline'; script-src 'unsafe-inline' 'unsafe-eval' 'self'; default-src 'self';"

  ExpiresActive	 On
  ExpiresDefault "access plus 1 month"
  ExpiresByType	 text/html "modification plus 1 week"

  <Directory /home/spc/web/sites/www.flummux.org/htdocs>
    Options		All
    AllowOverride	None
    Require		all granted
  </Directory>

  <Directory /home/spc/web/sites/www.flummux.org/htdocs/errors>
    Options	-Indexes
  </Directory>

  ErrorDocument	404	/errors/404.shtml
</VirtualHost>

And we finally get to the configuration for the site itself. Not much to say about this, except that the “Content-Security-Policy” header is annoying to get right, and I'm not sure how much benefit it brings, but hey, this is a test site so I'll have to see how it goes.

So that's pretty much how I'm setting up each site I host. It's pretty straightforward, except for the sheer terror that I've made a typo and will have to wait a month before trying to obtain a secure certifcate again. You have been warned.

Thursday, Debtember 08, 2022

Some comments on delimiter-first code

I was reading “Delimiter-first code” (via Lobsters) and I was struck by his first example of comma-first formatting:

-- leading commas               
SELECT employee_name
     , company_name
     , salary
     , state_code
     , city
FROM `employees`

That doesn't look half bad, I thought. It could make for smaller diffs in some cases. For instance, I have this:

fprintf(
         stdout,
         "Status: %d\r\n"
         "X-Error: %s\r\n"
         "Content-type: text/html\r\n"
         "\r\n",
         level,
         errmsg
       );

Rework it to use comma-first formatting:

fprintf(
            stdout
         , "Status: %d\r\n"
           "X-Error: %s\r\n"
           "Content-type: text/html\r\n"
           "\r\n"
         , level
         , errmsg
       );

I still have to work within the confines of C, but here it's easier to see that the string literal is one long literal and not four additional parameters, so that's good. It's a bit strange looking, but I could get used to it (I got used to “char const” over “const char” because const applies to the object to its right, except if starts the declaration; it makes parsing “char const *const p” easier for me—this declares p to be a constant pointer to constant data). And if I need to add to it:

fprintf(
            stdout
         , "Status: %d\r\n"
           "X-Error: %s\r\n"
           "Content-type: text/html\r\n"
	   "X-Foobar: %s\r\n"
           "\r\n"
         , level
         , errmsg
	 , foobar
       );

the diff is easier to follow—this:

5a6
>            "X-Foobar: %s\r\n"
8a10
>          , foobar

instead of:

5a6
>          "X-Foobar: %s\r\n"
8c9,10
<          errmsg
---
>          errmsg,
>          foobar

But then I came across this bit of code I wrote:

XEvent se =
{
  .xselection =
  {
    .type       = SelectionNotify,
    .serial     = NextRequest(event->xselectionrequest.display),
    .send_event = True,
    .display    = event->xselectionrequest.display,
    .requestor  = event->xselectionrequest.requestor,
    .selection  = event->xselectionrequest.selection,
    .target     = event->xselectionrequest.target,
    .property   = event->xselectionrequest.property,
    .time       = event->xselectionrequest.time,
  }
};

And … um …

XEvent se =
{
  .xselection =
  {
      .type       = SelectionNotify
    , .serial     = NextRequest(event->xselectionrequest.display)
    , .send_event = True
    , .display    = event->xselectionrequest.display
    , .requestor  = event->xselectionrequest.requestor
    , .selection  = event->xselectionrequest.selection
    , .target     = event->xselectionrequest.target
    , .property   = event->xselectionrequest.property
    , .time       = event->xselectionrequest.time
  }
};

Yeah …

C99 has designated initializers and also allows trailing commas when initializing structures, so the need for comma-first formatting doesn't really apply here; comma-first formatting only really applies to function calls. Perhaps languages should allow trailing commas in all contexts? It's something to think about.

The rest of the article is really about marking items in a list with some delimiter, usually a comma that comes after an item (except for the last item). There's one example he brings up: “1 , 2 , 3” vs. “・1 ・2 ・3” and here, I would say maybe “1 2 3” is best? Using spaces instead of a comma could still work in a lot of contexts in C:

/* none of this is valid C code */
rc = cgi_error(blog req HTTP_BADREQ "bad request");
fprintf(
	stdout 
	"Status: %s\r\nContent-type: text/html\r\n\r\n" 
	status
);
generic_cb("main" stdout callback_init(&cbd blog req));

It only breaks down when we go back to my first example above:

/* still not valid C code */
fprintf(
         stdout
         "Status: %d\r\n"
         "X-Error: %s\r\n"
         "Content-type: text/html\r\n"
         "\r\n"
         level
         errmsg
       );

Consecutive string literals are collected together into a single string literal, so such a construct as above could lead to some confusion. But this is just me riffing on using space as a delimiter.

The rest of the article does lay out a decent argument for leading delimiters for a lot of contexts, but removing closing brackets I think is too far. It works for Python because of syntactic white space, but it won't work for nearly any other language. It also fails for languages that support variadic functions, so it's probably best to keep both opening and closing brackets (or parenthesis or whatever). It also seems the arguments are more for vertical than horizontal formatting.

The article ends with:

Don’t be too surprised if this proposal evokes “hey this looks wrong, just plain wrong” reaction. After all, ideas we enjoy these days: enumeration from zero, using registers in names, structural programming, mandatory formatting, and even python’s approach to defining code blocks with indentation — every single one of them were met with a storm of criticism.

I'll keep that in mind, but even so, not everyone buys into mandatory formatting or significant white space.

Friday, Debtember 09, 2022

Notes on an overheard conversation as the radio was playing “Blue Christmas”

“Oh! You know who that is, right?”

“A very bad Elvis impersonator.”

“No! It's Dean Martin.”

“Really? I didn't know the Mad Magazine artist had a singing career as a bad Elvis impersonator.”

“It was first recorded in 1948 by Doye O'Dell—”

“He too, was probably a bad Elvis impersonator.”

“And the following year by Ernest Tubb, Hugo Winterhalter, and Russ Morgan.”

“All bad Elvis impersonators! All of them!”

“Elvis didn't even record it until 1957!”

“They were just early with their Elvis impersonations.”

“Sigh.”


“When out from the bathroom there arose such a clatter, she sprang from the bed to see what was the matter.”

“Oh ffffffffffffuuuuuuuuddddddddddge!”

Only I didn't say “Fudge.” I said the word, the big one, the queen mother of dirty words, the “F-dash-dash-dash” word. Fortunately, the loud crashing sound masked what I said. It also brought Bunny to the bathroom door.

“Are you alright?” she asked from the other side.

“Yes,” I said, hobbling to the door, trying to keep my balance as I was sopping wet with a plastic garbage bad covering my right foot. “although I did do a number on the garbage pail.” I then opened the door to let Bunny see the resulting carnage.

[Picture of the bathroom at Chez Boca with a shattered plastic garbage pail littering the floor] There was nothing we could do.  It was an ex-garbage pail, pining for the fjords!

“What happened?”

“I was trying to get out of the tub and slipped,” I said, pulling the garbage bag off my foot.

“Oh! You're bleeding!”

“Tis a flesh wound,” I said. “I've had worse.”

“Sean, you're lucky you didn't smash your head open. Those bathtubs have been known to kill people.”


I should have made a check list

Yup. I messed up again, just as I was afraid of. Using mod_md isn't that hard, it's just that any mistake you make means you just lost a few days, up to an entire month.

Sigh.

It's a bit late now, but I should have created this check list to help prevent mistakes:

  1. Figure out primary domain name (aka primary)
  2. Figure out alias domain name (aka alias)
  3. Configure MDomainSet
    1. <MDomainSet primary>
      1. Make sure primary is spelled correctly
    2. MDCertificateAgreement accepted
    3. MDContactEmail sean@coman.org
    4. MDMemer alias
      1. Make sure alias is spelled correctly
    5. MDRequireHttps temporary
    6. </MDomainSet>
  4. Configure VirtualHost alias:80
    1. <VirtualHost ip:80>
    2. ServerName alias
      1. Make sure alias is spelled correctly
    3. Redirect permanent / http://primary
      1. Make sure primary is spelled correctly
    4. Protocols h2 h2c http/1.1 acme-tls/1
    5. </VirtualHost>
  5. Configure VirtualHost primary:80
    1. <VirtualHost ip:80>
    2. ServerName primary
      1. Make sure primary is spelled correctly
    3. Protocols h2 h2c http/1.1 acme-tls/1
    4. </VirtualHost>
  6. Configure VirtualHost alias:443
    1. <VirtualHost ip:443>
    2. SSLEngine on
    3. ServerName alias
      1. Make sure alias is spelled correctly
    4. Redirect permanent / https://primary
      1. Make sure primary is spelled correctly
    5. Protocols h2 h2c http/1.1 acme-tls/1
    6. </VirtualHost>
  7. Configure VirtualHost primary:443
    1. <VirtualHost ip:443>
    2. SSLEngine on
    3. ServerName primary
      1. Make sure primary is spelled correctly
    4. Protocols h2 h2c http/1.1 acme-tls/1
    5. </VirtualHost>
    6. Other configuration settings …

My last mistake? I forgot to add acme-tls/1 to the Protocols directive.

Aaaaaaah!

It's not that I haven't done check lists before, and they're great at making sure you don't miss a step—I just have to remind myself to do them. But better late than never, as I can use this the next time I have to add a new domain.

Monday, Debtember 12, 2022

How I feel about HTTPS

My recent postings on using HTTPS for my sites reminded one of my readers, White_Rabbit, to send in a link to Discourse on HTTPS. The language may be salty, but it does align with my feelings towards HTTPS—namely, I don't really need it. But as I stated, Google will any day now start with the Big Scary Error Messages on non-secure sites, followed by (possibly—I don't know this for a fact, but a gut feeling) no longer allowing non-secure requests at all. And with Google's Chrome having a ridiculous market share, that's something to be concerned about.

Tuesday, Debtember 13, 2022

I think this toilet is going to be the death of us

It started yesterday when, after flushing the toilet, I noticed water seeping all around the toilet bowl. “This is not good,” said Bunny, as she inspected the growing puddle of water. “Let's cut off the water to this thing, and deal with it tomrrow. Looks like we're going to have to replace the wax ring.”

Cut to—today. Water disconnected, I pull the toilet off the floor revealing the horrible remains of a wax ring. Bunny then scrapped the remains up, and we replaced the wax ring with a non-wax ring that should last longer. We get the toilet back in place, secured it down, hooked the water up and hey! Looks like no more water.

Until there was.

It appears that it may not have been the wax ring at all, but the seals around the … um … bit (I have no idea what it's called) that regulates the water into the tank. Water is pouring out of the tank at the location the water pipe is connected to the toilet.

I swear, this toilet is cursed!


Notes on an overheard conversation in the bathroom

“I think we're finally done! I think this toilet should last years.”

“Well, the last time we worked on it was in 2018.”

“How do you know?”

The blog.

Wednesday, Debtember 14, 2022

An annotated example of using LPeg to parse a string to generate LPeg to parse other strings

A message on the Lua email list was asking about the best way to parse MQTT topics, specifically, how to handle the multilevel wildcard character. I answered that LPeg would be good for this, and gave annotated source code to show how it works. I thought I might also post about it for better visibility.

So, here's the code:

local lpeg = require "lpeg"
local Cc   = lpeg.Cc
local Cf   = lpeg.Cf
local P    = lpeg.P
local R    = lpeg.R

local filter do
  local separator = P'/'
  local topic     = R("AZ","az","09")^1 * (#separator + P(-1))
  local single    = P'+'                * (#separator + P(-1))
  local multi     = (P"/#" + P'#')      * P(-1)
  

  local csep    = separator / function()  return separator end
  local ctopic  = topic     / function(c) return P(c) end
  local csingle = single    / function()  return topic^-1 end
  local cmulti  = multi
                / function() return (separator * topic)^0 * P(-1) end

  filter = (P"#" * P(-1))
         / function() return (separator^-1 * topic)^0 * P(-1) end
         + Cf(  
               (-P"/#" * (ctopic + csingle + csep))^0 * cmulti^-1 * Cc(P(-1)),
               function(a,r)
                 return a * r
               end
             ) * P(-1)
end

And now the annotations—code fragment first, then annnotation.

local lpeg = require "lpeg"
local Cc   = lpeg.Cc
local Cf   = lpeg.Cf
local P    = lpeg.P
local R    = lpeg.R

This loads the LPeg module into Lua. I also grab the functions I'll be using from the module into locals. I do this not for speed purposes (although it will be slightly faster) but to reduce code clutter—there will be less lpeg. littered about the code, and I find that easier to read personally. It's not required that this be done.

local filter do
  -- ...
end

filter will contain the resulting LPeg expression. I create a new scope since the variables I'll be declaring won't be used outside of the definition for filter and it seems cleaner to me to reduce variable visibility as much as possible. It will also mean that over time (if this is intended for code that runs for a long time) the local variables created in this scope will be reclaimed as garbage. It's just a stylistic choice I do for Lua.

local separator = P'/'

This defines an LPeg expression that matches a literal slash character. The P() function can do a bit more than match literal strings, but we'll be mostly using it for literal string matches, as well as matching the end of the input string.

local topic = R("AZ","az","09")^1 * (#separator + P(-1))

This expression will match a “topic.” I'm using R() to match a range of characters (in this case, letters and digits). The multiplication sign (okay, an asterisk, but it's used to designate multiplication in Lua) here is used as an “AND” clause—a topic is a range of characters “AND” something else. That “something else” is either a separator (and the “#” mark is used to look ahead in the input without consuming it) or (the plus sign is read as “OR”) end of the input string.

local single = P'+' * (#separator + P(-1))

This expression will match a plus sign, which is used to indicate a single topic wildcard character. And again, we're expecting this to be followed by a separator character or the end of the string.

local multi = (P"/#" + P'#') * P(-1)

The “#” charcter is a multiple topic wildcard character and it must appear at the end of the string. I check for both “/#” and “#” because of a way I process the input later on. I might have found a better way to deal with this, but for a “proof-of-concept” this is good enough for now.

Now we get to the mind bending bit of this—I'm writing LPeg to parse a “topic filter” and generate an LPeg expression that will see if a “topic name” matches the “topic filter.”

local csep    = separator / function() return separator end
local ctopic  = topic     / function(c) return P(c) end
local csingle = single    / function()  return topic^-1 end
local cmulti  = multi
              / function() return (separator * topic)^0 * P(-1) end

These four expressions all do similar things—they match an existing pattern and pass the matching text to a function which returns an LPeg expression. csep returns an expression that matches the separator; ctopic returns an expression that matches the literal topic just parsed; csingle returns an expression that matches an alphanumeric string that represents a topic; and finally cmulti returns an expression that matches the remaining input.

And the final bit of code:

filter = (P"#" * P(-1))
       / function() return (separator^-1 * topic)^0 * P(-1) end
       + Cf(  
             (-P"/#" * (ctopic + csingle + csep))^0 * cmulti^-1 * Cc(P(-1)),
             function(a,r)
               return a * r
             end
           ) * P(-1)

The first line just matches a single multiple topic character and returns a pattern that will match the input. If that doesn't match (remember, “+” is read as an “OR”) we do a folding capture (Cf())—the code parses through the “topic filter” and builds an LPeg expression using a folding capture that will parse “topic names“ per the filter. Each piece that does match and return a capture will be “accumulated” into a single expression, the “folding” being done by the anonymous function passed in. The -P("/#") bit there looks ahead in the input to make sure it isn't a multiple topic wildcard character at the end of the string, and if that is the case, then it will compile a match for a literal topic, a non-specified topic (which fulfills the “single wildcard character match”) or a separator (but as long as the separator isn't itself followed by a multiple topic wildcard character, which is why we peek forward into the input). If we get to a point in the input where we either hit the end of input, or a multiple topic wildcard character, we handle that and cap the LPeg expression we're building with checking for end of the input (all on lines 4–7).

The point of all this is to turn a string like “+/tennis/+/#” into the following LPeg expression:

topic * separator * P"tennis" * separator * topic * (separator * topic)^0 * P(-1)

which can then be used to match “topic names:”

local topics =
{
  "news/tennis",                    -- won't match
  "news/tennis/mcenroe",            -- will match
  "news/football/dolphins",         -- won't match
  "news/baseball/marlins",          -- won't match
  "sports/tennis/williams/ranking", -- will match
}

local the_topic = filter:match("+/tennis/+/#")
for _,topic in ipairs(topics) do
  if the_topic:match(topic) then
    report_on_it(topic)
  end
end

Yes, there is a learning curve (okay, maybe a cliff) to LPeg. But once you get used to it, it is quite powerful and allows you to transform data in ways that you can't with regular expressions. In fact, instead of returning the default value (which is one past the position of the match in the string, or nil if it failed to parse) I could have instead returned an array of topics (or nil if it failed to parse)—but I will leave such changes as an exercise to the reader.

There's also a bit about dollar signs further down in the MQTT document I linked to, but again, handling that is left as an exercise for the reader.


Discussions about this entry

Thursday, Debtember 15, 2022

Notes on an overheard conversation while bringing the garbage can up from the street

“Oh! We got another Christmas card!”

“Cool! Who is it from?”

“It's from XXX.”

“Wait! He mailed it? He actually used a stamp?

“Yes.”

“He lives across the street!

“That reminds me, I have to mail him his card.”

“And you're going to use a stamp to mail it to him?”

“Yes!”

“Why not just walk it across the street and put it in his mailbox?”

“Because it's tradition. And isn't it illegal for civilians to put items into a mailbox they don't own?”

“Oy vey.”


Re: Conformance Should Mean Something - fputc, and Freestanding

Well, that’s okay, because I’m not one to just sit on my hands no matter how much silence I’m met with or how much crippling depression is running through my system: I reached out to a few folks who I knew worked on MISRA, met with them, and thankfully they brought it up in their group meeting on my behalf. Even if the Committee doesn’t want to / feel like commenting (and to be perfectly clear, they do not have to comment; it’s not like I wrote a paper and nobody owes me nothin’, Jack, including a response to my e-mail anyhow), at least MISRA could bring some clarity, right? They work with a ton of implementations, especially embedded/freestanding implementations, and so they should be able to give me good feedback. I contacted an implementer I have the utmost of faith in who attends MISRA functions, so they could bring the issue up at a meeting. They sort of hashed it out. People for/against the code snippet above, whether 2 could be returned validly, and whether what TI’s Run-Time Support Library was doing was standards-blessed behavior (ignoring any “Freestanding” weasel- ing)…

there was divergence on whether or not the snippet was illegal.

It is a little concerning that the body responsible for figuring out the dusty corners of the C standard and guaranteeing portable behavior are not sure if (a) they like what the code snippet implies or (b) which direction of implication they’d like it to go in. But, on the other hand, they are at least united in that some clarity around the subject would be helpful and that we should make it clear what we mean in these functions and in the specification. They’re sort of on top of moving the needle to make sure we are writing high-quality code that can stand the test of time, and “fwrite may not portably do what you want and you need to write a wrapper function before using it every time” needs to be something they should be keen on agreeing on before we can move forward with using basic file abstractions for C. Of course, this is the human-based, common, and shared understanding I was being told about before that would lead us to Nirvana, and what I’m unfortunately finding is that it’s not actually all that bound together in harmony.

Via Hacker News, Conformance Should Mean Something - fputc, and Freestanding | The PastureConformance Should Mean Something - fputc, and Freestanding | The Pasture

It is a mess. The code from the blog post works on most systems, but most systems these days use 8-bit characters; the article is about systems where a character is defined as 16-bits (allowed by the C Standard) and where an integer is also 16-bits (again, allowed by the C Standard and is the minimum size an integer can be per the C specification). It's rare to have non-8-bit characters on desktop computers these days (or even tablet and smart phones) but it seems it's not quite that rare in the embedded space, where you have DSPs that have weird architectures and a charater is most likely the same size as an integer. And that's where the trouble starts.

The main issue is with fputc(). The C Standard states:

The fputc function

Synopsis

#include <stdio.h>
int fputc(int c,FILE *stream);

Description

The fputc function writes the character specified by c (converted to an unsigned char) to the output stream pointed to by stream, at the position indicated by the associated file position indicator for the stream (if defined), and advances the indicator appropriately. If the file cannot support positioning requests, or if the stream was opened with append mode, the character is appended to the output stream.

Returns

The fputc function returns the character written. If a write error occurs, the error indicator for the stream is set and fputc returns EOF.

If both char and int are the same size, then this function can't work as is. The function assumes that the size of int is larger than the size of a char, thus any value of a signed or unsigned char can be converted into an int or an EOF, (a value unrepresentable as a char).

If char and int are the same size … yikes!

And from reading the blog post, it seems that most embedded systems will clamp down on the values written by fputc() to be between 0 and 255, regardless of what you pass in, even when characters can be 16 bits in size. This is probably to remain interoperable with the rest of the world where char is 8-bits in size (Unicode notwithstanding).

I'm also not sure about this bit from the blog post about fwrite(): “Okay, so it will loop and call through fputc. This is covered under the as-if wording, so it’s not like your standard library has to write exactly a loop of fputc.” I checked the standard, and it always mentions “as if” explicitly, like “this International Standard treats such an end-of-line indicator as if it were a single new-line character” (emphasis added) or “The implementation shall behave as if no library function calls the setlocale function.” (again, emphasis added). But no where is it mentioned in releation to fwrite().

Here's the C89 Standard on fwrite():

The fwrite function

Synopsis

#include <stdio.h>
int fwrite(const void * ptr,size_t size, size_t nmemb, FILE * stream);

Description

The fwrite function writes, from the array pointed to by ptr, up to nmemb elements whose size is specified by size, to the stream pointed to by stream. The file position indicator for the stream (if defined) is advanced by the number of characters successfully written. If an error occurs, the resulting value of the file position indicator for the stream is indeterminate.

Returns

The fwrite function returns the number of elements successfully written, which will be less than nmemb only if a write error is encountered.

It's the C99 standard that added the sentence about calling fputc() (which I highlighted below):

The fwrite function

Synopsis

#include <stdio.h>
int fwrite(const void * restrict ptr,size_t size, size_t nmemb, FILE * restrict stream);

Description

The fwrite function writes, into the array pointed to by ptr, up to nmemb elements whose size is specified by size, from the stream pointed to by stream. For each object, size calls are made to the fputc function, taking the values (in order) from an array of unsigned char exactly overlaying the object. The file position indicator for the stream (if defined) is advanced by the number of characters successfully written. If an error occurs, the resulting value of the file position indicator for the stream is indeterminate.

Returns

The fwrite function returns the number of elements successfully written, which will be less than nmemb only if a write error is encountered. If size or nmemb is zero, fwrite returns zero and the state of the stream remains unchanged.

And nary an “as-if” in sight.

I have to wonder why that sentence was added to C99, if not to force calls to fputc(). I supposed the C Standards Comittee had a reason for it, and I don't think they would have omitted the “as if.” If they did, they failed to add it to the C11 and the proposed C2x standards. So I'm not sure if an implementation of fwrite() can avoid calling fgetc().


And unrelated to this post, I did come across this lovely footnote in the C99 standard:

Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state.

Seriously?

It's not even “implementation defined?” Because that sounds like an implementation detail (for example, on CP/M). But undefined? Come on!

Worse, it's not even listed in “Appendix J.2 Undefined behavior.”

Friday, Debtember 16, 2022

You know what would be refreshing? Vintage ads for Saturnalia

I like vintage Coca-Cola ads, and a lot of our modern view of Santa Claus comes from said vintage Coca-Cola ads, but I'm not sure what I make of this holiday display from the neighborhood:

[Huge, three-panel display of a vintage Coca-Cola Santa Ad flanked by a life-sized Santa on the left, and maybe a yeti on the right, in the front yard of a house in the neighborhood] Either Sasquatch is planning an Artic expedition, and thus shilling for Coke for funding, or else he's really old and trying to raise retirement funds by shilling for Coke.  In either case, we know Santa has been a sell out for mostly a century now.

I do think Coke has made bank on such advertising, because not only is someone in my neighborhood hawking Coke from nearly centry old ad, but now I'm shilling for Coke for displaying my neighbor's display of a nearly centry old ad. Nice play, Coke.

Monday, Debtember 19, 2022

Santa Claus, Coca-Cola, Sprite, and vast amounts of cookies

I was wrong when I mentioned that Coca-Cola created our modern image of Santa Clausit goes back into the mid-to-late 1800s. But Coca-Cola's Santa Claus advertising might have been the inspiration for Sprite.

And speaking of Santa and food products, here's a video that answers the question no one bothered to ask, how many cookies does Santa Claus consume on Christmas? It's amazing he even survives the trip.

Wednesday, Debtember 21, 2022

Unit test this

Or, “What is a ‘unit test,’ part II

I saw a decent answer to my question which makes sense for C. Another decent (if a bit vague) answer was:

So to answer Sean's question, a unit test is that which requires the least amount of work to setup but is able to reduce the need for coverage of the larger system. Whatever you want to consider a "unit" is up to you and the language you're using.

Re: What is a "unittest"?

I left off my previous entry pointing to a function that I would love to have seen someone else “unit test,” but alas, no one did. But I always had plans on going all “The Martian” on the code and “unit test the XXXX out of it.”

So here's the code in question:

/***********************************************
*
* Copyright 2021 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
************************************************/

#include <stdbool.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <assert.h>

#include <sys/types.h>
#include <sys/wait.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <syslog.h>
#include <sysexits.h>

/************************************************************************/

bool run_hook(char const *tag,char const *argv[])
{
  assert(tag     != NULL);
  assert(argv    != NULL);
  assert(argv[0] != NULL);
  
  pid_t child = fork();
  
  if (child == -1)
  {
    syslog(LOG_ERR,"%s='%s' fork()='%s'",tag,argv[0],strerror(errno));
    return false;
  }
  else if (child == 0)
  {
    extern char **environ;
    int devnull = open("/dev/null",O_RDWR);
    if (devnull == -1)
      _Exit(EX_UNAVAILABLE);
    if (dup2(devnull,STDIN_FILENO) == -1)
      _Exit(EX_OSERR);
    if (dup2(devnull,STDOUT_FILENO) == -1)
      _Exit(EX_OSERR);
    if (dup2(devnull,STDERR_FILENO) == -1)
      _Exit(EX_OSERR);
    for (int fh = STDERR_FILENO + 1 ; fh <= devnull ; fh++)
      if (close(fh) == -1)
        _Exit(EX_OSERR);
    execve((char *)argv[0],(char **)argv,environ);
    _Exit(EX_UNAVAILABLE);
  }
  else
  {
    int status;
    
    if (waitpid(child,&status,0) != child)
    {
      syslog(LOG_ERR,"%s='%s' waitpid()='%s'",tag,argv[0],strerror(errno));
      return false;
    }
    
    if (WIFEXITED(status))
    {
      if (WEXITSTATUS(status) != 0)
      {
        syslog(LOG_ERR,"%s='%s' status=%d",tag,argv[0],WEXITSTATUS(status));
        return false;
      }
    }
    else
    {
      syslog(LOG_ERR,"%s='%s' terminated='%s'",tag,argv[0],strsignal(WTERMSIG(status)));
      return false;
    }
  }
  
  return true;
}

/************************************************************************/

As you can see, it's one function, in one file, with the only dependencies being the operating system. So this should be the “perfect unit” to write some “unit tests” for. The code does replicate a bit of the standard C function system(), so why not use system() in the first place? The answer comes from the manual page for Linux:

Do not use system() from a privileged program (a set-user-ID or set-group-ID program, or a program with capabilities) because strange values for some environment variables might be used to subvert system integrity. For example, PATH could be manipulated so that an arbitrary program is executed with privilege. Use the exec(3) family of functions instead, but not execlp(3) or execvp(3) (which also use the PATH environment variable to search for an executable).

system(3) - Linux manual page

This function runs as part of a set-user-ID program (mod_blog in particular, for reasons beyond the scope of this entry) so no system() for me. Also, I avoid having to construct a command string that might have failed to properly escape the filename to avoid complications with the shell's use of certain characters. And it's not like the function was hard for me to write. I've done functions like this before, and it worked the first time without issue when I wrote it (and the small changes to it since have been a simplification of the parameters, and changes to the logging messages). It's also not a very long function (I'm sorry Ron Jefferies, but 14 lines of code isn't “a lot of code”).

The reason I wanted some unit test proponents to look at this code is that it involves quite a bit of interaction with the operating system in C, a not-very-popular programming language these days, and I was curious as to the level of “unit testing“ that would be done. No bites, but my gut feeling is that a “unit test proponent” faced with this code would just throw two scripts to it, one to return successfully:

int main(void)
{
  return 0;
}

and one to return failure:

int main(void)
{
  return 1;
}

and call it “battle tested.” The two test cases themselves are pretty easy to write:

#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>

#include "tap14.h"

extern bool run_hook  (char const *,char const **);

int main(void)
{
  tap_plan(2,NULL);
  tap_assert( run_hook("script",(char const *[]){ "./success" , NULL }),"success script");
  tap_assert(!run_hook("script",(char const *[]){ "./failure" , NULL }),"failure script");
  return tap_done();
}

(I'm using my own testing framework based on TAP. I wrote my own to be as minimal as possible to get the job done—other TAP frameworks for C I looked at were too overblown for my tastes.)

An improvement would be to add a test script that terminates via a signal. It's again easy enough to write that script:

#include <signal.h>

int main(void)
{
  raise(SIGINT);
  return 1;
}

and the appropriate test:

tap_assert(!run_hook("script",(char const *[]){ "./terminate" , NULL }),"terminate script");

But that only tests half the code. How do I know I didn't mess up the codepath in the child process before I execute the script in question? At The Enterprise, it was expected our tests cover about 70% of the code at least— I'm short of that target here. And as I say, I'm aiming to “unit test the XXXX out of this” and get 100% code coverage, because shouldn't that be the goal of “unit testing?”

But to achieve that target, I'm going to have to deal with “failing” a bunch of existing functions, and according to my interprestation of “A Set of Unit Testing Rules,” if I'm not mocking, I don't have a “unit test.” So I have to mock some system calls.

And here is where I hit a problem—to do so will invoke the dreaded “undefined behavior of C.” Seriously–if I provide my own function for, say, dup2(), I am technically invoking undefined behavior of the C machine (this incredibly long flame war on the Lua mailing list of all places, goes into the gory details behind that statement). Now granted, certain tool chains on certain operating systems allow one to override functions, but you can't rely upon this behavior in general. Given that I'm doing all of this on Linux, and Linux in general allows this, I can proceed (carefully) with mocking system functions.

That should be straightforward enough. The mocked open() function:

static int err_open;
static int ret_open;

int open(char const *pathname,int flags)
{
  (void)pathname;
  (void)flags;
  if (err_open != 0)
    errno = err_open;
  return ret_open; // XXX had bug here
}

This should be fine for my purposes as I don't actually need to read from the file. If I really needed to call into the original function, this might work:

static int err_open;

int myopen(char const *pathname,int flags)
{
  if (err_open == 0)
    return open(pathname,flags,0);
  errno = err_open;
  return -1;
}

#define open	myopen

But as the “A Set of Unit Testing Rules” article states, “A test is not a unit test if: it touches the file system.” So the above isn't a “true mock,” and I shall continue with my “true mocked” function instead. I can continue with similar implementations for the functions dup2(), close() and waitpid(). Unfortunately, there are three functions that may present some interesting challenges: fork(), execve(), and _Exit(). The first returns twice (kind of—if you squint and look sideways), the second only returns if there's an error, and the third never returns.

Now looking over the implementation of the function I'm testing, and thinking about things, I could do a similar implementation for fork()—the returning twice thing is where it returns once in the parent process, and once in the child process, but I can treat that as just a normal return, at least for purposes of testing. For execve(), I can only test the error path here as the script being “run” isn't being run. That just leaves _Exit() as the final function to mock. And for that one, I wrap the entire call to run_hook() (the function being “unit tested”) around setjmp() and longjmp() to simulate the not-returning aspect of _Exit(). So a test of the close() codepath would look like:

static bool X(char const *tag,char const *argv[])
{
  volatile int rc = setjmp(buf_exit);
  if (rc != 0)
    return false;
  return run_hook(tag,argv);
}

int main(void)
{
  /* ... */

  ret_open  = 4;
  err_dup2  = 0;
  ret_dup2  = 0;
  bad_dup2  = -1;
  err_close = EIO;
  ret_close = -1;
  tap_assert(!X("script",(char const *[]){ "./success" , NULL }),"close() fail");

  /* ... */
  return tap_done();
}

I got all the test cases written up and all 11 tests pass:

TAP version 14
1..11
ok 1 - success script
ok 2 - failure script
ok 3 - terminate script
ok 4 - fork() fail
ok 5 - open() fail
ok 6 - dup2(stdin) fail
ok 7 - dup2(stdout) fail
ok 8 - dup2(stderr) fail
ok 9 - close() fail
ok 10 - execve() fail
ok 11 - waitpid() fail

A successful “unit test” with 100% code coverage. But I'm not happy with this. First off, I don't get the actual logging information for each test case. All I get is:

Dec 21 19:34:10	user	err	/dev/log	test_run_hook	script='./failure' status=1
Dec 21 19:34:10	user	err	/dev/log	test_run_hook	script='./terminate' terminated='Interrupt'
Dec 21 19:34:10	user	err	/dev/log	test_run_hook	script='./success' fork()='Cannot allocate memory'
Dec 21 19:34:10	user	err	/dev/log	test_run_hook	script='./success' waitpid()='No child processes'

and not

Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./failure' status=1
Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./terminate' terminated='Interrupt'
Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./success' fork()='Cannot allocate memory'
Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./success' status=69
Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./success' status=71
Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./success' status=71
Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./success' status=71
Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./success' status=71
Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./success' status=69
Dec 21 19:04:10	user	err	/dev/log	test_run_hook	script='./success' waitpid()='No child processes'

(And no! I am not checking that syslog() got the right message in the test cases—been there, done that and all I got was a stupid tee-shirt and emotional scars. It's easy enough to just manually check after the test runs, at least for this entry.)

It just doesn't feel right to me that I'm testing in a faked environment. No, to get a better “unit test” I'm afraid I'm going to have to keep invoking undefined C behavior that is allowed by Linux, and interpose our functions by using LD_PRELOAD to override the functions. And I can set things up so that I can still call the original function when I want it to succeed. So all that needs to be done is write a shared object file with my versions of the functions, and include this function:

static pid_t (*___fork)   (void);
static int   (*___open)   (char const *,int,mode_t);
static int   (*___dup2)   (int,int);
static int   (*___close)  (int);
static int   (*___execve) (char const *,char *const [],char *const []);
static pid_t (*___waitpid)(pid_t,int *,int);

__attribute__((constructor))
void init(void)
{
  ___fork    = dlsym(RTLD_NEXT,"fork");
  ___open    = dlsym(RTLD_NEXT,"open");
  ___dup2    = dlsym(RTLD_NEXT,"dup2");
  ___close   = dlsym(RTLD_NEXT,"close");
  ___execve  = dlsym(RTLD_NEXT,"execve");
  ___waitpid = dlsym(RTLD_NEXT,"waitpid");
}

(I include all three parameters to open() even though the last one is optional—I don't want to have to deal with the variable argument machinery with C—this should work “just fine on my machine”—I'm already into territory that C formally forbids. I'm using triple leading underscores because single and double leading underscores are reserved to the C compiler and implementation, but nothing is mentioned about three leading underscores.)

Now, how to get information to my replacement functions about when to fail. I thought about it, and while there is a way to do it with global variables, it gets complicated and I'd rather do this as simply as possible. I figured I could sneak variables through to my replacement functions via putenv(), getenv() and unsetenv(). This will make the close() failed test case look like:

  putenv((char *)"SPC_CLOSE_FAIL=5"); /* EIO */
  tap_assert(!run_hook("script",(char const *[]){ "./success" , NULL }),"close() fail");
  unsetenv("SPC_CLOSE_FAIL");

And the corresponding close() function is:

int close(int fd)
{
  char *fail = getenv("SPC_CLOSE_FAIL");
  if (fail == NULL)
    return (*___close)(fd);
  errno = (int)strtoul(fail,NULL,10);
  return -1;
}

The other functions work simularly, and when run:

TAP version 14
1..11
ok 1 - success script
ok 2 - failure script
ok 3 - terminate script
ok 4 - fork() fail
ok 5 - open() fail
ok 6 - dup2(stdin) fail
ok 7 - dup2(stdout) fail
ok 8 - dup2(stderr) fail
ok 9 - close() fail
ok 10 - execve() fail
ok 11 - waitpid() fail

More importantly, since the functions can actually function as intended when I don't want them to fail, I get the full output I expect in the system logs. But per the “A Set of Unit Testing Rules” article, this is no longer a “proper unit test.”

I don't know. The more I try to understand “unit testing,” the less sense it makes to me. There is no real consensus as to what a “unit” is, and it seems strange (or in my more honest opinion, stupid) that we as programmers are not trusted to write code without tests, yet we're trusted to write a ton of code untested as long as such code is testing code. As I kept trying to impart to my former manager at The Enterprise before I left, the test case results aren't to be trusted as gospel (and it always was by him) because I didn't fully understand what the code was supposed to do (because the business logic in “Project Lumbergh” has become a literal mess of contradictory logic and communication among the team seriously broke down).

So maybe we're not supposed to “unit test” functions that involve input, output, or system interactions. Maybe we're supposed to “unit test” more “pure functions” and leave messy real world details to, oh, I don't know, some other form of testing. Okay, I have one final function that should be perfect for “unit testing.”

We shall see …


Discussions about this entry

Thursday, Debtember 22, 2022

It's kind of sad to think that the cheapest gift are the milk maids

It wasn't until I read this article from the Transylvania Times that I thought about the price of all the gifts from “The Twelve Days of Christmas.” It was also the first time I learned about The Christmas Price Index, where all this is tracked every year. This year's index, if you were to buy all the gifts mentioned in the song, comes to a staggering $197,071.09. And for all that, the 40 maids a-milking will only cost you $290. Not mentioned is the cleanup costs of all the gifts.

Friday, Debtember 23, 2022

Notes on an overheard conversation as the radio was playing “Winter Wonderland”

“Oh, that's Wayne Newton!”

Wayne Newton?

“Yes.”

“Are you sure that's not Eartha Kitt?”

“Yes dear, Eartha Kitt has a much lower singing register.”


Extreme tiny house, Asheville edition

“That is not a tiny house,” said Bunny.

“But it is, it's only 480 square feet.” [45 square meters —Editor]

“It feels big.”

“It does, and the design is wonderful.”

We were talking about this $80,000 home in the Ashville, North Carolina area. While it's technically a tiny house, it manages to feel big (living room, kitchen, bathroom, bedroom and recording studio), while being one of the more beautiful examples of a home I've seen (although we were not fans of the alternating tread stair cases, we do understand why they were used). You would never guess it was made from mostly recycled and unused materials. It's just gorgeous.


Some alternative do-it-yourself keyboards

I'm always fascinated by alternative keyboards, especially when they're hand made. Matthew Dockrey has made two of them. The first is based on old print technology, the two-thirds keyboard, which involved creating his own keycaps. And then there is his pocket typewriter, which is exactly what it is—a manual typewriter that fits in your pocket. It's mad stuff, but it's fantastic at the same time.


“Outdoors is currently not heated”

From my friend Tom , who posted this on Me­Linked­My­Insta­Face­Space­Book­We­Gram­In, a TV sports caster forced to report on the weather. He makes his opinion on the weather (in a live report) loud and clear. I can only hope he keeps his job.

Monday, Debtember 26, 2022

It's not a “security hole,” it's a “privacy hole” and I don't think it's anything to worry about

I found a reference to the following in my notes from May this year—I suppose better late than never. Anyway …

The Potential Security Hole

Imagine a scenario where Big Tech does a massive marketing campaign in an attempt to mainstream the protocol. As part of their marketing, they could try to sell the idea of a Big Proprietary browser, or even add Gemini support directly into their existing web browser. Then they start a disinformation campaign to demonize the wide range of existing clients. Normies, naturally, would buy that without question, as they do. At that point, Big Tech could simply have their browser automatically generate a client certificate for every user and attach it to every request.

Couple this with some server side analytics aggregators, and we have the same privacy problems on Gemini that the web has.

Security Hole in Gemini Protocol?

I feel this is more of a “privacy hole” than a “security hole” but that's could be me being pedantic. Honestly, I don't feel like this is anything that needs to be worried about. Gemini is much too small to worry about. I suppose a Gemini server could generate client certificates and a compliant Gemini client could accept them for later use to reference a Gemini site, but that's not now client certificates are specified as working— it's the client that generates the certificate and the server can accept or reject it (odd, I know, and not how I would envision them working).

But it's not like there aren't other ways for tracking a user in Gemini. A Gemini server could conceivably generate unique links for a given client from a given IP address. It's not perfect, and it really only kind of tracks a single user. And let's not forget just logging every request and <gasp!> not anonymizing IP addresses! Oh the horror! But such “tracking” is only limited to one server. It seems silly that such tracking could be done Internet wide, especially given that automatically displaying of images is considered scandalous in the Gemini community.

Notice that all of these codes are described in way that implies that the server is already expecting a client certificate for that request. What if there is a certificate attached when not expected? Unless I have missed or misinterpreted something, the spec does not account for this.

61 comes close, but that implies that a cert was indeed expected, it's just the wrong one.

Proposed Solution

Add 4th certificate status code, let's call it 63, to be returned in this scenario. It would not stop malicious or corporate servers from refusing ever to return this code, but it would at least allow users to see which sites are not trying to stalk them, because someone using Flashy Surveillance Browser would be shown this error anytime they visit an indie capsule.

Could this itself be exploited, though? I think so. Proprietary browsers could show a 'security warning' that the capsule they are attempting to access is … insert scary corporate buzzword … and that proceeding would be 'dangerous'. This, of course, would be total horseshit but the normies wouldn't know any better.

Security Hole in Gemini Protocol?

I have two responses to this: One, just do it! Add the check to your Gemini server and return the undocumented response code 63. Yes, it's not part of the standard. Yes, it's extending the protocol (“The horror! The horror!”). But on the gripping hand, it just might help. My own Gemini server serves up a custom error code when it receives an empty request which is expressly not allowed by the specification! I used to serve up a response code of “59 Bad Request” but it never seemed to do anything. I then changed it to return “58 Not a gopher server!” and while it hasn't stopped such requests, they have been slowly going down over the past year or so. So go ahead, just do it! Add the “63 Why are you forcing an unwanted certificate on me?” response.

My second response is—client certificates are dead on the web, what makes you think “proprietary Gemini browsers” will go to this trouble? If anything, I would think a “propriatary Gemini browser” would insist on using a real secure certificate, and not a self-signed one or one using a custom certificate authority, long before it would attempt to force known client certificates on users.


Discussions about this entry

Tuesday, Debtember 27, 2022

And in another timeline, Google sold out to Yahoo for $10,000,000 …

I'm not quite sure what to make of “eπc 2014” (or “Epic 2014”). It's a “what-if” story that diverges from our own timeline in 2004 and goes to some really weird places (Googlezon anyone?). It's a history that never happened, and yet, it still feels like we've just a few years short of it actually happening.

Obligatory Picture

Dad was resigned to the fact that I was, indeed, a landlubber, and turned the boat around yet again …

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer

No AI was used in the making of this site, unless otherwise noted.

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2025 by Sean Conner. All Rights Reserved.