The Boston Diaries

Tuesday, August 06, 2019

There are even bots crawling gopherspace

My webserver isn't the only program beset by bots—my gopher server is also being crawled. I identified one bot repeatedly trying to request the selector (the gopher equivalent of a web page) Phlog when it should be trying to request Phlog: (note the ending “:”). On the web server, I could inform the client of the proper link with a “permanent redirect” and hope it gets the hint, but gopher lacks such a facility. All this bot was getting back was the rather lack luster gopher error, which for an automated process, is pretty darned hard to distinguish from actual content, due to the simplicity of the protocol.

Oh a lark, I decided to see if there was a gopher server on the IP address of the bot, and lo', there was. I was able to send an email to the organization responsible, and they fixed the error.

That still left a few bots that thought I was running a web server on port 70. Yes, I was getting requests for “GET / HTTP/1.1” over and over again, and these particular bots weren't getting the clue they weren't talking to a web server by the lack of proper web server response. I decided to handle these by replying as a tea pot because why not? And to further support the joke, my gopher server will not only respond to the web method GET but also BREW (and to think I wanted to write a gopher server, not a web server … sigh). Hopefully that will placate them and they'll go away (although on second thought, I think I should have done a permament redirect to gopher://gopher.conman.org/ to see how well the web bots would handle that!).

An MJ12Bot update

When last we left the MJ12Bot saga, it was pretty apparent it wasn't a well written bot, but true to their word, they haven't crawled my server since.

“The Knowledge AI” bot however … it is trying to repeatedly fetch /%22https:/mj12bot.com/%22 from my web server.

What is it with these horribly written web bots?

Thursday, August 08, 2019

Unfortunately, my blog on Gopher is a second class citizen

I will be the first to admit that my blog on gopher is a second-class citizen. When I wrote the gopher server I took the quickest and easiest way to adapt my blog to a (close enough) text-only medium by feeding requests through Lynx. Note I didn't say “well done” (of course not! I said it was a “medium!” Ba-dum-bump! I'll be here all week! Don't forget to tip the wait staff!) or even pretty.

For instance, this entry looks like this via gopher:

Extreme contradiction, Brevard edition

So Bunny and I came across this lovely bit of signage in downtown
[1]Brevard:

[“The white zone is for immediate loading and unloading of passengers only.
There is no stopping in the red zone.” / “The red zone is for immediate
loading and unloading of passengers only. There is no stopping in the white
zone.” / “No, the white zone is for loading of passengers and there is no
stopping in a RED zone.” / “The red zone has always been for loading and
unloading of passengers. There's never stopping in a white zone.” / “Don't
you tell me which zone is for loading, and which zone is for stopping!”]

So which is it? Loading, or parking? Or loading of wheelchairs for parking?
Or parking for wheelchairs to be loaded? I'm so confused!

References

1. https://www.cityofbrevard.com/

First off, there's no indication that there's a photo on that page, unless you realize I'm using a very old web convention of describing the image contents by placing said description inside of square brackets.

Secondly, there is no actual link to the picture on the converted entry.

Third, on most (all?) graphical browsers, just holding the mouse over the images will pop up the text above (I don't think many people know about this).

And fourth, the text is a reference to the movie “Airplane!” which does fit the subject of the picture on that page, which is of two traffic signs giving conflicting parking directions (this really doesn't have anything to do with the second-class status of the post on gopher—just more of an FYI type of thing).

I used Lynx because I didn't want to bother writing code to convert HTML to plain text—especially when I had access to a tool that can do it for me. It's just that it doesn't really do a great job because I expect the HTML to do the formatting for me. And I really do need to write a description of the photo in addition to the caption I include for each picture. Ideally, it would look something like:

Extreme contradiction, Brevard edition

So Bunny and I came across this lovely bit of signage in downtown
Brevard [1]:

[Image of two traffic signs one above the other. The upper one says
“NO PARKING, LOADING ZONE” and the lower one saying “RESERVED PARKING
for the HANDICAPPED”—“The white zone is for immediate loading and
unloading of passengers only. There is no stopping in the red zone.” /
“The red zone is for immediate loading and unloading of passengers only.
There is no stopping in the white zone.” / “No, the white zone is for
loading of passengers and there is no stopping in a RED zone.” / “The
red zone has always been for loading and unloading of passengers. There's
never stopping in a white zone.” / “Don't you tell me which zone is for
loading, and which zone is for stopping!”] [2]

So which is it? Loading, or parking? Or loading of wheelchairs for parking?
Or parking for wheelchairs to be loaded? I'm so confused!

References

[1] https://www.cityofbrevard.com/
[2] gopher://gopher.conman.org/IPhlog:2019/06/13/Confusion.jpg

And then reality sets in and I'm just not up to writing an HTML-to-text translator right now.

Sigh.

Sorry, gopherspace.

The “Tonya Harding Solution” to computer benchmarks

… we knew we had to do more to truly earn those extra credit points. Luckily, I had one final optimization idea:

The Tonya Harding Solution: The benchmark program works by calling the optimized function, calling the naive function, and comparing the two times. And this gave me a truly devilish idea. I added some code to calc_depth_optimized that created a child process. That child process would wait for calc_depth_naive to start running, then send a SIGUSR1 signal to the benchmark process. That signal would interrupt calc_depth_naive and jump to a special signal handler function I'd installed:
void our_handler(int signal) {
    // if you can't win the race, shoot your opponent in the legs
    sleep(image_size * 4 / 10000);
}
So while we did implement a number of features that made our program faster, we achieved our final high score by making the naive version a whole lot slower. If only that 4 had been a 5 …

CS 61C Performance Competition

I'll have to hand it to Carter Sande for literally beating the competition in benchmarking.

(Although it wasn't Tonya Harding who did the attack, but Shane Stant, hired by Harding's ex-husband Jeff Gillooly who attacked Nancy Kerrigan with a police baton and not a gun. Harding herself claims she had nothing to do with the attack.)

Who knew ice cream could be so hard?

We have an ice cream maker. I like chocolate ice cream, the darker, the better. And the instruction manual for the ice cream maker lists a recipe for a “decadent chocolate ice cream” which not only calls for Dutch processed cocoa, but 8 ounces (230g) of bittersweet chocolate. I opted for a really dark chocolate, like on the order of 90% cocoa dark chocolate.

Yeah, I like my chocolate dark.

I'm also trying to cut sugar out of my diet as much as possible, so I decided to use a bit less surgar than what the receipe calls for, so this stuff isn't going to be overly sweet.

I get the ice cream base churned, into a plastic bowl and in the freezer, and I wait for several hours, eagerly awaiting some deep, dark, decadent chocolate ice cream.

I end up with a deep, dark, decadent ice chocolate rock.

This isn't hard ice cream. This isn't even ice cream. It's an ice rock is what it is. I try dropping the bowl a few inches onto the kitchen counter to show Bunny how rock-like it is, and the bowl hits the counter, bounces off and shatters onto the floor.

I mentioned it was in a plastic bowl, right?

There are shards of plastic across the kitchen.

The deep, dark, chocolate ice rock is in one piece.

I think the ice cream base was too dense for much, if any, air to get whipped in while churning. Bunny thinks the low surgar content contributed to the rock-like consistency. Both are probably to blame for this. I do recall that the last time I made the “decadent chocolate ice cream, but with all the surgar,” it did tend towards the harder side of ice cream. So I think the next time I should try the basic vanilla recipe with less surgar and see what happens. If that turns out fine, then try the basic chocolate recipe.

Saturday, August 10, 2019

From chocolate ice rock to vanilla ice cream

My friend Squeaky replied to my chocolate ice rock post backing up Bunny's assertion that sugar content is key when making ice cream—to little and ice crystalization takes over making for a rather solid ice rock than ice cream. So on Friday, I went back to basics and made the basic vanilla ice cream receipe:

2c (500ml) heavy cream
1c (250ml) whole milk
¾c (180ml) sugar
2tsp (10ml) vanilla extract

Mix, chill, churn.

I wasn't quite satisfied with making a vanilla ice cream, so I decided to add cherries—I chopped up a bunch of cherries (the hardest part was getting the seeds out—man they were stubborn), a bit of sugar, chill, and add in the last few minutes of churning. I was initially concerned because the instant I added the cherries, the mixture starting loosening up—I think the cherries to a bit too warm. Next time I think I'll freeze the cherries before adding them.

But after sitting in the freezer overnight, the results were much better—I actually had ice cream and not a large rock. Lesson learned—sugar is key to ice cream.

Sunday, August 11, 2019

See, this is one reason why I'm leery about updates

A few days ago I noticed my iPhone notifying me of a “critical security update.” And it was only for the iPhone, not the iPad. Sigh. Fine. Download, install, and get on with my life.

Only a few days go by and I finally clued in that I wasn't receiving any actual phone calls! Not that that really bugs me, as most of the calls I get these days are spam calls from around the country, but it was odd that my dad has left two voice mails and yet, his calls did not show up on the recently called list.

So I check, and yes, I have no service. I try rebooting the phone, and that didn't work. I tried resetting the network, and that didn't work (and had the side effect of wiping out all known passwords for existing Wi-Fi networks).

Bunny suggested I go through the trouble shooting pages on the Monopolistic Phone Company website as she waited on the phone for a Monopolistic Phone Company Representative and the race was on to see who finished first.

Turns out, I won. I think it was step five where the Monopolistic Phone Company had me turn off the phone (and by “turn off the phone” I mean a hard power down and not just shutting off the screen), pull out the SIM card, push the SIM card back in, and turn the phone on. That turns out to have worked.

And now I can receive all those spam calls warning me that this is the final, no, we really mean it, final warning that my car warrantee has expired and if I don't act now I'm doomed to financial ruin. I honestly don't know how I lived without those calls.

Wednesday, August 14, 2019

“Here's a nickel kid. Get yourself a real computer”

“Here you go, kid,” said Dad, as he handed me a book with a bright yellow jacket. “I heard this ‘Linux’ is the next up-and-coming thing in computers.”

I look at the title: Linux for Dummies: Quick Reference. “Um … thank you. You do realize I run Linux both at home, and at work, right?”

“Hey, maybe you can learn something from it.”

So I'm skimming through the book and … elm? pine? rsh? FVWM? When was this book written? … Oh … 2000. That explains it. Everybody these days are running mutt, ssh and GNOME.

It'll fit nicely on the shelf next to Sams Teach Yourself TCP/IP in 24 Hours and Inside OS/2.

Monday, August 19, 2019

Notes on an overheard phone conversation at The Ft. Lauderdale Office of The Corporation

“Hello?”

“Yes, who is this?”

“Who is this”

“This is … Sean.”

“And this is … XXXX [of the IT department].”

“Okay.”

“This is in reference to … ”

“Your email … ”

“About … ”

“Um … the laptop?”

“Oh yes! Sean! Of course! See, I get a lot of spam calls these days.”

“Yeah, I get a lot of phishing emails from you guys these days.”

“Ha!”

Tuesday, August 20, 2019

Profiles of Lua code

I started “Project: Sippy-Cup” six years ago as a “proof-of-concept” and it ended up in production (without my knowledge and to my dismay at the time) about a year or two afterwards. The code was written to be correct, not fast. And for the past four or five years its been running, performance has never been a real issue. But that's changing, as the projected traffic levels shoot past the “oh my” and into the “oh my God” territory.

“Project: Sippy-Cup” processes a lot of SIP messages, which are text based, so there's a lot of text processing. I use LPEG for ease of writing parsers, but it's not necessarily as fast as it could be.

There are two issues with LPEG—it has infinite look-ahead, and ordered choices. So the code that checks for the SIP method:

method = lpeg.P"ACK"
       + lpeg.P"BYE"
       + lpeg.P"CANCEL"
       + lpeg.P"INFO"
       + lpeg.P"INVITE"
       + lpeg.P"MESSAGE"
       + lpeg.P"NOTIFY"
       + lpeg.P"OPTIONS"
       + lpeg.P"PRACK"
       + lpeg.P"PUBLISH"
       + lpeg.P"REFER"
       + lpeg.P"REGISTER"
       + lpeg.P"SUBSCRIBE"
       + lpeg.P"UPDATE"
       + (lpeg.R("AZ","az","09") + lpeg.S"-.!%*_+`'~")^1

will first compare the input to “ACK”; if it doesn't match, it then backtracks and tries comparing the input to “BYE”, and so on down the list until it gets the last rule which is a “catch-all” rule. It would be easy to reorder the list so that the checks are “most-likely” to “least-likely,” but really the entire list could be removed leaving just the catch-all:

method = (lpeg.R("AZ","az","09") + lpeg.S"-.!%*_+`'~")^1

I have the same issue with SIP headers—there are 100 headers that are “parsed” (for various values of “parsed”) but I only really look at a dozen headers—the rest just slow things down and can be passed by a generic parsing rule. The full headers were added during the “proof-of-concept” stage since I wasn't sure at the time which headers would be critical and which ones wouldn't, and I've never gone back and cleaned up the code.

Another aspect is the sheer number of validity checks the code makes on the incoming SIP message. Many of the checks don't really have any effect on the processing due to managerial mandate at the time, so they could go (I wanted strict checking that bailed on any error; my manager at the time did not want such strictness—no need to guess who won, but I still track parsing irregularities).

So while I feel these are two areas where the code could be made faster, I don't know if that's where the time is spent, and so it's time to profile the code.

The issue now is that the system profiler will profile the code as C, not as Lua. I don't need to profile the code to know the Lua VM gets called all the time. What I need to know is what Lua code is called all the time. But it can't hurt to try the system profiler, right? And given that the regression test has over 12,000 test cases, we should get some good profiling information, right?

Original Profile—Each sample counts as 0.01 seconds.
% time	cumulative seconds	self seconds	name
13.32	3.47	3.47	`match`
12.74	6.79	3.32	`luaV_execute`
9.31	9.22	2.43	`luaS_newlstr`
6.83	11.00	1.78	`luaD_precall`
5.31	12.38	1.39	`luaH_getstr`
3.38	13.26	0.88	`luaD_poscall`
2.57	13.93	0.67	`index2adr`
2.19	14.50	0.57	`luaV_gettable`

Not bad at all. The function match() is the LPEG execution engine, which matches my intial thoughts on the code. It wasn't much work to remove extraneous SIP headers I don't bother with, and to simplify the method parsing (see above). Re-profile the code and:

Modified LPEG—Each sample counts as 0.01 seconds.
% time	cumulative seconds	self seconds	name
14.25	3.67	3.67	`luaV_execute`
11.22	6.56	2.89	`luaS_newlstr`
10.49	9.26	2.70	`match`
6.33	10.89	1.63	`luaD_precall`
5.20	12.23	1.34	`luaH_getstr`
2.76	12.94	0.71	`index2adr`
2.58	13.61	0.67	`luaD_poscall`
2.41	14.23	0.62	`luaV_gettable`

match() drops from first to third place, so that's good. And a load test done by the QA engineer showed an easy 25% increase is message processing.

But that's really as far as I can go with profiling. I did a run where I removed most of the validation checks (but only after I saw none of them were triggered over the past 30 days) and didn't see much of a speed improvement. So I really need to profile the Lua code as Lua code and not as C.

That's going to take some work.

Wednesday, August 21, 2019

“Nobody Expects the Surprising Profile Results!”

It still surprises me that the results of profiling can be so surprising.

Today I profiled Lua code as Lua. It was less work than expected and all it took was about 30 lines of Lua code. For now, I'm just recording the file name, function name (if available—not all Lua functions have names) and the line number as that's all that's really needed.

But as I was writing the code to profile the code, I wasn't expecting any real results from profiling “Project: Sippy-Cup.” The code is really just:

get packet
parse packet
validate SIP message
acknowledge SIP message
get relevent data from SIP message
query “Project: Lumbergh” (business logic)
wait for results
send results in SIP message
wait for SIP acknowledgement
done

I was expecting a fairly uniform profile result, and if pressed, maybe a blip for awaiting results from “Project: Lumbergh” as that could take a bit. What I did not expect was this:

Surprising Profile Results
count	file/function/line
21755	@third_party/LPeg-Parsers/ip-text.lua::44
6000	@XXXXXXXXXXXXXXXXXXXXXXXXXX:send_query:339
2409	@XXXXXXXXXXXXXXXXXXXX:XXXXXXXXX:128

After that, the results tend to flatten out. And yes, the send_query() makes sense, but ip-text.lua? Three times more than the #2 spot?

This line of code?

local n = tonumber(capture,16)

That's the hot spot? Wait? I'm using IPv6 for the regression test? When did that happen? Wait? I'm surprised by that as well? What is going on here?

Okay, breathe.

Okay.

I decide to do another run, this time at a finer grain, about 1/10 the previous profiling interval and see what happens.

Finer Grained Surprising Profile Results
count	file/function/line
133186	@third_party/LPeg-Parsers/ip-text.lua::44
29683	@third_party/LPeg-Parsers/ip-text.lua::46
21910	@third_party/LPeg-Parsers/ip-text.lua::45
19749	@XXXXXXXXXXXXXXXXXXXXXXXXXXX:XXXXXXXXXXXXX:279

And the results flatten out after that. So the hot spot of “Project: Sippy-Cup” appears to be this bit of code:

local h16 = Cmt(HEXDIG^1,function(_,position,capture)
  local n = tonumber(capture,16)
  if n < 65536 then
    return position
  end
end)

send_query() doesn't show up until the 26^TH spot, but since it's finer grained, it does show up multiple times, just at different lines.

So … yeah.

I have to think on this.

Done with the profiling for now

After some more profiling work I've come to the conclusion that yes, the hot spot is in ip-text.lua, and that after that function, it's quite flat otherwise. The difference between ip-text.lua and the number two spot isn't quite as bad as I initially thought, although it took some post-processing to lump all the function calls together to determine that (required because Lua can't always know the “name” of a function, but with the line numbers they can be reconciled). It's only called about twice as much as the next most used function instead of the nearly 4½ times it appeared earlier.

As far as profiling “Project: Sippy-Cup” is concerned, I think I'm about as far as I can go at this time. I did improve the performance with some minor code changes and any more improvement will take significant resources. So I'm calling it good enough for now.

Thursday, August 22, 2019

Through Windows Darkly

I arrived to The Ft. Lauderdale Office of the Corporation to find a package sitting on my desk. It had finally arrived—the Corporate Overlords' mandated managed laptop.

It's only been a year and a half that we've been ~~threatened by~~ promised new managed laptops to replace the self-managed ones we currently use, but in the end, it was decided to let us keep our current machines and use the new “managed laptops” to access the Corporate Overlords' network. I think this was decided due to the cultural differences between The Corporation and the Corporate Overlords—we're Mac and they're Windows.

Yes, of course the new managed laptop is a Windows box.

It's a Lenovo ThinkPad T480, and compared to the current laptops I have at work, a Linux system with 4 2.8GHz CPUs with 4G RAM and a Mac with 8 2.8GHz CPUs and 16G RAM, it's a bit underpowered with 4 1.8GHz CPUs and 8G RAM. I will admit that the keyboard is nicer than the keyboards on my existing laptops, but that's like saying bronchitis is better than pneumonia—technically that's true, but they're still bad. It looks like I'll have to break out another real keyboard from the stash.

The laptop was thinner than I expected, and the build feels solid. Lots of ports, so that's nice. The screen is nice, and the built-in camera has a sliding cover so I don't have to spoil the sleek asthetic with a tab of electrical tape.

The real downside for me is the software—Windows. I can hear the gales of laughter from my friend Gregory when he hears I have to suffer Windows. The last time I used Windows was … um … 1999? It's been a while, and not only do I have to get used to a nearly alien interface, but one that I have little control over.

Well, I have a bit of leeway—I was able to install Firefox so it isn't quite that bad, but there's a lot I can't do; external block storage devices are blocked outright, there are some websites I can't visit and editing the Windows Registry is right out! Not to mention the crap ton of anti-viral, anti-spam, anti-phishing, ~~anti-development~~, corporate-friendly “productivity” software installed and running on the machine.

This is something I have never experienced. Until now, every computer I used at a company has never been this locked down, not even at IBM. It's going to take some adjustment to get used to it.

Meanwhile, I've been poking around on the system and—“End of Day Restart” …

“End of Day Restart?”

Seriosly, Microsoft? You automated the daily reboot?

Wow … this is definitely going to take some time to get used to …

Thursday, August 29, 2019

Okay, so I wasn't really done with the profiling

Last week I was surprised to find this bit of Lua code as the hot spot in “Project: Sippy-Cup:”

local h16 = Cmt(HEXDIG^1,function(_,position,capture)
  local n = tonumber(capture,16)
  if n < 65536 then
    return position
  end
end)

Last night I realized that this code was stupid in this context. The code was originally based upon the original IP address parsing module which converted the text into a binary representation of the code. So in that context, converting the text into a number made sense. When I made the text-only version, I did the minimum amount of work required and thus, left in some sub-optimal code.

But here? I'm just looking at text, expecting up to four hex digits. A string of four hex digits will, by definition, always be less than 65,536. And LPeg has a way of specifying “up to four, but no more” of a pattern. It's just:

local h16 = HEXDIG^-4

I made that change to the code, and reprofiled “Project: Sippy-Cup.” It didn't change the results at the C level all that much (the LPEG C function merge() is still third, as I do quite a bit of parsing so that's expected), but the results in Lua are vastly different—it's now the code that verifies the incoming phone numbers that's the hot spot. It doesn't surprise me very much as we do that twice per request (one for the caller, and one for the recipient), and it's not nearly as bad a hot spot as the above code was.