The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Sunday, January 01, 2012

Yet another 365.2421897 days around the sun

I wasn't sure how many New Year entries I've done, and I was rather surprised to find that I've managed to post on the first of January every year except for 2001 and 2002. And far be it from me to stop that tradition this year.

Last night our crazy neighbor was at it again. I wasn't about to view the show up close, but the trajectories of the fireworks meant we had a decent view from the screened in back porch (and the roof at Chez Boca is tile, so live embers weren't an issue).

And much like in 2008, I present (via Bunny) this video to celebrate our lack of snow.

Oh, and …

HAPPY NEW YEAR!

Monday, January 02, 2012

The next pie is going to be expensive …

To help ring in the New Year, Bunny and I are enjoying some Buddy's Pizza, which I think is the last of the Detroit pizza we bought on the way home (yes, that was two months ago—the pizza has been in deep freeze since, and it's still just as good).

Update on Tuesday, January 3rd, 2012

Turns out, it wasn't the last pie


Home Video Commentaries

Via Michael Duff are a few home videos with commentary. It's quite funny if you watch DVD commentaries on a regular basis, as the two brothers in question critique the home video as a “film” and not as some boring home videos.

Tuesday, January 03, 2012

The next pie isn't going to be that expensive …

I was just informed that we do, in fact, have another Buddy's pizza in the deep freeze.

In other news, I'm now writing blog entries that could very well fit on Twitter. I need to rectify that.

Tomorrow.

Wednesday, January 04, 2012

An annoying attack

It looks like today is “Attack Day.” I run a program to show the output from syslog in real time (it's part of my syslogintr project) and (like right as I type this) I'm seeing a slew of bogus DNS queries:

security: info: client 46.234.117.251#25345: query (cache) 'isc.org/ANY/IN' denied
security: info: client 46.234.117.251#25345: query (cache) 'isc.org/ANY/IN' denied
security: info: client 46.234.117.251#25345: query (cache) 'isc.org/ANY/IN' denied
security: info: client 46.234.117.251#25345: query (cache) 'isc.org/ANY/IN' denied
security: info: client 46.234.117.251#25345: query (cache) 'isc.org/ANY/IN' denied
security: info: client 46.234.117.251#25345: query (cache) 'isc.org/ANY/IN' denied
security: info: client 46.234.117.251#25345: query (cache) 'isc.org/ANY/IN' denied
security: info: client 46.234.117.251#25345: query (cache) 'isc.org/ANY/IN' denied
security: info: client 46.234.117.251#25345: query (cache) 'isc.org/ANY/IN' denied
security: info: client 46.234.117.251#25345: query (cache) 'isc.org/ANY/IN' denied
security: info: client 46.234.117.251#25345: query (cache) 'isc.org/ANY/IN' denied
security: info: client 46.234.117.251#25345: query (cache) 'isc.org/ANY/IN' denied
security: info: client 46.234.117.251#25345: query (cache) 'isc.org/ANY/IN' denied

And not just from that IP address either—so far 87 different IPs have been sending bogus requests to my DNS server. I would also like to find the program that does this, as every single request has come from the same port. Different IP address, sure, but the source port is always the same.

I'm also seriously tempted to write a program to send back a nice, custom response to these, in the hopes that the program actually cares about the response. The obvious thing to do is send back a response that contains an infinitely long domain name—it's not hard to do, just the right two bytes in the right location and you have an infinitely long name to parse (this is exploiting the DNS message compression scheme—spcdns has code to protect against this, by the way). Or maybe not an infinitely long domain name, but an insanely long one (again, easy to do by exploiting the message compression scheme, and again, spcdns has protection against this attack as well).

Perhaps better would be to return an answer to a question that was never asked to begin with. “Oh, you want any record for isc.org? Here, have the LOC record for nsa.gov. Have a nice day.” Or perhaps just echo back the original packet and really confuse the sending program.

But in doing some searching, this appears to be an old denial of service attack against Internet Systems Consortium (the makers of bind, quite possibly the most widely used DNS server) and as such, any bogus reponses would probably not do anything to the attacking software, which probably ignores any replies anyway.

Update on Wednesday, January 5th, 2012

Good thing I didn't send back any custom responses


A denial of service attack

As we continue with “Attack Day,” there was a brief denial of service attack at The Ft. Lauderdale Office today.

[You had cake and didn't invite me?]

Poor Edvard, being denied cake like that.

Thursday, January 05, 2012

An anoying attack, Part II

I'm also seriously tempted to write a program to send back a nice, custom response to these, in the hopes that the program actually cares about the response.

An annoying attack - The Boston Diaries - Captain Napalm

Yeah, about that …

I've done a bit more research and apparently my server is part of a DNS amplification attack, where some machine (or machines) somewhere on the Inernet is sending my server (along with possibly other DNS servers) a forged DNS request, in the hopes that my DNS server will do the requested DNS lookup and return the result (in this case, any DNS record for isc.org, which is known for returning rather large DNS resonses) in the hopes of denying service to the forged IP address.

And even though my server won't do the actual DNS request, it still returns a packet saying as much, so even though my server is not sending a large packet, it is returning a packet, and thus participating the the DDoS attack, however little.

So even if I did send back a bogus response, it wouldn't be directed at the guilty party.

Sigh.

So I guess the thing to do is just filter those requests at the firewall.

Friday, January 06, 2012

Jews, in Tehran? Really?

Don't Tell My Mother I'm in Iran” is an interesting look into Iran and shows me (in my opinion) that it's not necessarily Iran that's bad, but the Iranian government (I suppose one could say the same of us—we're not bad, but our government is certainly questionable).

It's also hard to fathom there being twenty-five (25!) synagogues in Tehran. Who'da thunk?

Saturday, January 07, 2012

Reason to hate PHP #3-Running PHP-core dumped to preserve sanity

I'm going through the backlog of links I wanted to talk about when I come across this lovely PHPism:

elseif, as its name suggests, is a combination of if and else. Like else, it extends an if statement to execute a different statement in case the original if expression evaluates to FALSE. However, unlike else, it will execute that alternative expression only if the elseif conditional expression evaluates to TRUE. …

There may be several elseifs within the same if statement. The first elseif expression (if any) that evaluates to TRUE would be executed. In PHP, you can also write 'else if' (in two words) and the behavior would be identical to the one of 'elseif' (in a single word). The syntactic meaning is slightly different (if you're familiar with C, this is the same behavior) but the bottom line is that both would result in exactly the same behavior.

The elseif statement is only executed if the preceding if expression and any preceding elseif expressions evaluated to FALSE, and the current elseif expression evaluated to TRUE.

Note: Note that elseif and else if will only be considered exactly the same when using curly brackets as in the above example. When using a colon to define your if/elseif conditions, you must not separate else if into two words, or PHP will fail with a parse error.

PHP: elseif/else if—Manual

So what this insane bit of verbiage is saying, is that “elseif” and “else if” are the same, except when they're not, which has to do with using either braces to separate code, or colons (which I'm not familiar with syntax wise in PHP). In effect, PHP supports both “elseif” and “else if” but with slightly different subtle semantics that could trip you up if you aren't careful.

Please, pick one, or the other, but not both! Sheesh!


An interesting take on applications

The great horizontal killer applications are actually just fancy data structures.

Spreadsheets are not just tools for doing “what-if” analysis. They provide a specific data structure: a table. Most Excel users never enter a formula. They use Excel when they need a table. The gridlines are the most important feature of Excel, not recalc.

Word processors are not just tools for writing books, reports, and letters. They provide a specific data structure: lines of text which automatically wrap and split into pages.

PowerPoint is not just a tool for making boring meetings. It provides a specific data structure: an array of full-screen images. 

Via Spin the Cat, How Trello is different - Joel on Software

In the past, I've given Smirk grief over his use of Excel to make what I called “glorified text files,” but I see he's not alone in using Excel for tracking lists. In fact, I suspect that if the entire calculating engine of Excel were excised, not many people outside the financial realm would even notice (and the financial system would probably be better off too).

So it looks like Joel has a point—spread sheets provide a type of data structure, and people use it as such. Looks like I'll have to cut Smirk some slack now. Sigh.

Sunday, January 08, 2012

Le Roi est mort, vive le Roi!

Even though today is the birthday of such luminaries as Elvis Presley and David Bowie, we must not neglect that today is also the day of the birth of the Supreme Commander of the cleanest race on the planet, the Sagacious Leader of the Democratic People's Republic of Korea, Kim Jong Un! Why he even bothers to grace this planet with his presence is just one more mysterious thing about him (personally, I hope he's not as batXXXX crazy as his father, but I don't have high hopes)


“One does not simply walk into a Linux kernel upgrade!”

I normally don't upgrade software unless there's a compelling reason for me to do so, and there are a few compelling reasons for me to upgrade the Linux kernel. It's not features that I can't live without (for I'm doing so right now) but there are some features, like signal and timer delivery via file descriptors, that have intrigued me enough to contemplate it.

Okay, in the late 90s I used to fairly regularly build custom Linux kernels for my various computers. But that was in the 2.02.1 days, when 2.0.x was the “stable” version, and 2.1.x was the “development” version. These days, it's all development versions with the random version, like 2.6.9 or 2.6.20, given the moniker of “stable,” just because.

But really, how hard could it be?

Okay, I downloaded 3.1.8 (3.1? Already? I thought 3.0 was just released!), but it requires a later version of GCC than I have. Okay, so I need a new version of GCC. Which probably requires the latest binutils. And because of new system calls since Linux 2.6.9 (which I'm running), I need to upgrade glibc, and while I'm at it, a few utilties like ps and lsof and …

Really? Is it this complicated? [Sean goes off, reads the Linux From Scratch Book and runs away screaming. Yup, it's that complicated. —Editor].

Monday, January 09, 2012

99 ways to program a hex, Part 1: The Standard

For Christmas, Hoade gave me 99 Ways To Tell A Story: Excercises in Style, an interesting book whereby the same story (an eight panel cartoon about a guy walking to the refrigerator and forgetting what he was going to look for) ninety-nine different ways; a different style, a different genre, different number of panels, whatever. Ninety-nine different ways.

It got me to thinking. While the book was about different ways to present a story, what about programming? Okay, other than sounding completely insane, could a program be written ninety-nine different ways?

An easy way is a different computer language for each version. Sure, there's CPL, BCPL, B, C (four variations there—K&R, C89, C99, C11), C++ (C++, C++9x, C++2x), Objective-C, D, Fortran (many versions over the years), BASIC (just about every computer made between 1975 and 1985 came with its own dialect of BASIC, along with the original Dartmouth version), Algol (Agol 60, Algol 68), Pascal (Pascal, Turbo Pascal, Delphi), Assembly (basically each CPU architecture has its own form, for instance the 6502, 6800, 6809, 68000 (which has variant), 8080, Z80, 8086 (all the way up to the latest Pentium 4), MIPS (which has variants), SPARC (and variants), ARM (and variants), PDP-1, PHP-7, PHP-8, PHP-10, PHP-11, VAX) Forth (just as many dialects as BASIC), Modula (Modula and Modula-II), SNOBOL, ICON, Hope, bash, sh, csh, ksh, VIth, Alice, Pilot, COBOL, Intercal, Perl (several major variations), Piet, Python (Python 1, Python 2, Python 3), PHP (practically every version ever released), awk, Ruby (nearly every version ever released), Lua (several versions), Malbolge, Java (several major revisions), Lisp (Lisp, Lisp 1.5, MACLISP, Common Lisp, Scheme (I know! I know! It's not Lisp, even though it has the same syntax and pretty much the same command set, it's a LISP1 and Common Lisp is a LISP2 (and if you have to ask, you'll have to take a few graduate programming courses to understand))), Erlang, Prolog, Haskel, ML, Oberon, LOLCODE, Befunge, Chef, BrainXXXX and that alone will probably get us to 99 versions right there.

But I don't have access to a lot of these languages. Heck, most of them are dead, obscure or esoteric and trying to even find examples would be difficult. Especially since what I want to do is more than just a simple “Hello World” program. I want to write a program that is actually useful, but not so long as to make this insane project … um … insaner.

So I'm going to try just a few languages (which still leaves me with plenty to choose from; my home system alone comes with C, Ruby 1.8, Perl 5.8, Python 2.3, Python 2.6, PHP 5.1, Lua 5.1, C++, sh, bash, awk, 68000 assembler, x86 assembler and probably a few I'm forgetting about. I might not hit all of these, or maybe I will. We'll see.

And the program I selected for this insanity silly treatment is a small utility I wrote back in the early 90s when I first learned C—it's a program that dumps data in hexadecimal:

/*************************************************************************
*
* Copyright 1991 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: Original Version, C89 */

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>

#define LINESIZE	  16

static void 	do_dump		(FILE *,FILE *);

/****************************************************************/

int main(int argc,char *argv[])
{
  if (argc == 1)
    do_dump(stdin,stdout);
  else
  {
    int i;
    
    for (i = 1 ; i < argc ; i++)
    {
      FILE *fp;
      
      fp = fopen(argv[i],"rb");
      if (fp == NULL)
      {
        perror(argv[i]);
        continue;
      }

      printf("-----%s-----\n",argv[i]);
      do_dump(fp,stdout);
      fclose(fp);
    }
  }

  return EXIT_SUCCESS;
}

/******************************************************************/

static void do_dump(FILE *fpin,FILE *fpout)
{
  unsigned char  buffer[BUFSIZ];
  unsigned char *pbyte;
  size_t         offset;
  size_t         bread;
  size_t         j;
  char           ascii[LINESIZE + 1];
  
  offset = 0;

  while((bread = fread(buffer,1,BUFSIZ,fpin)) > 0)
  {
    pbyte = buffer;
    while (bread > 0)
    {
      fprintf(fpout,"%08lX: ",(unsigned long)offset);
      j = 0;
      do
      {
        fprintf(fpout,"%02X ",*pbyte);
        if (isprint(*pbyte))
          ascii [j] = *pbyte;
        else
          ascii [j] = '.';
        pbyte  ++;
        offset ++;
        j      ++;
        bread  --;
      } while ((j < LINESIZE) && (bread > 0));
      ascii [j] = '\0';
      if (j < LINESIZE)
      {
	size_t i;

	for (i = j ; i < LINESIZE ; i++) fprintf(fpout,"   ");
      }
      fprintf(fpout,"%s\n",ascii);      
    }
    
    if (fflush(fpout) == EOF)
    {
      perror("output");
      exit(EXIT_FAILURE);
    }
  }
}

/***************************************************************/

This is the current version of the program, written in C89. There's not much to say about this—it's straight forward, does one thing, does it well, and we'll see just how far I can take this version.

Tuesday, January 10, 2012

99 ways to program a hex, Part 2: K&R C

One rule I've set for myself: the output of each program shall be the same (if at all possible). And the baseline for the output is yesterday's version. It's also a useful test—if the output doesn't match, there's a bug somewhere. Other than that, anything goes.

Today's code is written using a style known as “K&R C.”

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: K&R C */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define LINESIZE	  16

/****************************************************************/

main(argc,argv)
char **argv;
{
	int i;
	FILE *fp;
  
	if (argc == 1)
		do_dump(stdin,stdout);
	else {
		for (i = 1 ; i < argc ; i++) {
			fp = fopen(argv[i],"rb");
			if (fp == NULL) {
				perror(argv[i]);
				continue;
			}

			printf("-----%s-----\n",argv[i]);
			do_dump(fp,stdout);
			fclose(fp);
		}
	}

	return 0;
}

/******************************************************************/

do_dump(fpin,fpout)
FILE *fpin,*fpout;
{
	char buffer[BUFSIZ],ascii[LINESIZE + 1],*pbyte;
	int offset = 0,bread,j,i;
  
	while((bread = fread(buffer,1,BUFSIZ,fpin)) > 0) {
		pbyte = buffer;
		while (bread > 0) {
			fprintf(fpout,"%08lX: ",(unsigned long)offset);
			j = 0;
			do {
				fprintf(fpout,"%02X ",(unsigned char)*pbyte);
				if (*pbyte >= ' ' && *pbyte <= '~')
					ascii [j] = *pbyte;
				else
					ascii [j] = '.';
				pbyte++;
				offset++;
				j++;
				bread--;
			} while ((j < LINESIZE) && (bread > 0));
			ascii [j] = '\0';
			if (j < LINESIZE) {
				for (i = j ; i < LINESIZE ; i++) 
					fprintf(fpout,"   ");
			}
			fprintf(fpout,"%s\n",ascii);      
		}
    
		if (fflush(fpout) == EOF) {
			perror("output");
			exit(1);
		}
	}
}

/***************************************************************/

The term “K&R” is still used to refer to a particular style of writing C code (which I personally can't stand, but that's me)—the placement of opening braces, the severe indentation and often times a vowel impairment in names (which I didn't go for here).

But the term can also refer to code written before C was first standardized in 1989 (that is known as “ANSI C” or “C89”). While you had to always declare all your variables, function parameters, on the other hand, only had to be mentioned and unless otherwise noted, were assumed to be of type int. The same goes for the function return value—unless otherwise noted, all functions return a type of int.

Wednesday, January 11, 2012

99 ways to program a hex, Part 3: C89 in K&R style

To separate the style from the version, here's the program, written in C89, using the K&R style.

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: C89 in K&R style */

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>

#define LINESIZE	  16

static void 	do_dump		(FILE *,FILE *);

/****************************************************************/

int main(int argc,char *argv[]) {
	if (argc == 1) {
		do_dump(stdin,stdout);
	} else {
		int i;
    
		for (i = 1 ; i < argc ; i++) {
			FILE *fp;
      
			fp = fopen(argv[i],"rb");
			if (fp == NULL) {
				perror(argv[i]);
				continue;
			}

			printf("-----%s-----\n",argv[i]);
			do_dump(fp,stdout);
			fclose(fp);
		}
	}

	return EXIT_SUCCESS;
}

/******************************************************************/

static void do_dump(FILE *fpin,FILE *fpout) {
	unsigned char buffer[BUFSIZ],*pbyte;
	size_t offset=0,bread,j;
	char ascii[LINESIZE + 1];

	while((bread = fread(buffer,1,BUFSIZ,fpin)) > 0) {
		pbyte = buffer;
		while (bread > 0) {
			fprintf(fpout,"%08lX: ",(unsigned long)offset);
			j = 0;
			do {
				fprintf(fpout,"%02X ",*pbyte);
				if (isprint(*pbyte)) {
					ascii [j] = *pbyte;
				} else {
					ascii [j] = '.';
				}
				pbyte  ++;
				offset ++;
				j      ++;
				bread  --;
			} while ((j < LINESIZE) && (bread > 0));
			
			ascii [j] = '\0';
			if (j < LINESIZE) {
				size_t i;

				for (i = j ; i < LINESIZE ; i++) {
					fprintf(fpout,"   ");
				}
			}
			fprintf(fpout,"%s\n",ascii);      
		}
    
		if (fflush(fpout) == EOF) {
			perror("output");
			exit(EXIT_FAILURE);
		}
	}
}

/***************************************************************/

We have function prototypes, and more appropriate typedefs for some of the variables, but in the K&R style (ick). Lots of software is still written using this style, like Linux, on the grounds that if it was Good Enough™ for Kernighan and Ritchie, then it's Good Enough™ for the rest of us, never mind that Kernighan and Ritchie wrote their software on teletypes, which is near enough to a manual typewriter hooked up to a computer that if I used one, I would try to type as little as possible myself. But personally, I don't use a teletype; I use a real keyboard and a huge monitor with a small font, so I find little use for the K&R style.

Thursday, January 12, 2012

Just to make sure, I did my “once-a-decade” check of an IDE. Yup, I still hate 'em.

My first exposure to an IDE was in the mid-80s with Turbo Pascal 3, and I hated it. Not the language per se but the editor. By then, I was used to IBM's PE (version 1.0—never found a bug but there were a few limitations, mostly due to it being able to run under MS-DOS 1.0) with it's true block copy, the ability to move anywhere on the screen and type (and have it insert spaces, if required) and fairly pneumonic keybindings, so I had some issues with how Borland thought an editor should work.

I found it a nightmare.

And then when Turbo Pascal 4 came out, with an entirely new interface where they tried (and in my opinion, failed) to do “windows” in a text mode and well … it took a bit over a decade for me look at another IDE.

By now it's the late 90s, and I'm working on Brainstorm. One of the first Java IDEs came out (and I have no idea what program it was or even what became of it). I thought I'd give it a try as I was curious if it would handle an existing project.

It didn't.

My code killed it. I suspect the programmers of that IDE never thought that anyone would bother with writing their own layout manager, and I recall the dialog went something like:

IDE
What … is your language?
Sean
Java.
IDE
What … is your quest?
Sean
To compile this Java code I wrote.
IDE
What layout manager are you using?
Sean
Really? I wrote my own.
IDE
Huh? I don't know that [falls over the Bridge of Death into the Gorge of Eternal Peril] Auuuuuuuuuuuuuuuuuuuuuuuugh!

Scratch another IDE off my list. And a bit over a decade passes.

We're doing a lot of Java programming at The Ft. Lauderdale Office of The Corporation and most of the programmers are using this IDE called Eclipse (we doing both backend stuff and Android development). I've heard of it. Nearly all Java programmers swear by it. I figure I'd give it a go, if only as an source code/object viewer. I suck down the 300+ megabyte package that Ubuntu offers overnight and give it a go.

And … yeah. I have no idea what I'm doing. Why does it want a “workspace?” How do I load an existing project into the darned thing? Why is the Android Eclipse extension failing? Oh, the “stable” version that Ubuntu coughed up is more than twenty minutes old, and therefore, an ancient and decrepid piece of XXXX. I should know better by now.

So off I go to the Eclipse site, and I'm faced with a dozen different options for Eclipse. Wait? There are three different versions for Java, one for C/C++? One for Javascript? Wait? I thought Eclipse could work with a bunch of different languages. Shouldn't these all be modules or extentions or something? You mean I have to download a separate version for each language I want? And what's with the three Java versions?

Auuuuuuuuuuuuuuugh!

And off I go the Bridge of Death into the Gorge of Eternal Peril.

Okay, so I pick one, download it, figure out I can just run the darned thing and don't have to install it. Okay, the Android extension (another umptillion bytes) installs fine, and I figure out that I can use the existing project, but only if I build it from the command line first (um … isn't that kind of defeating the purpose of an IDE?) and neither I nor J (office mate) can figure out why I'm getting these two errors about overriding an interface (which is the point of an interface—you override it). If I do the so called “quick fix” that Eclipse suggests, it fails on the same line with a different error.

Sigh.

The Android Emulator runs the code just fine … I guess … since I'm supposed to test this code. But i can launch the compiled application (compiled via command line) on the emulator, so the code works (and no errors from the command line compiler there). It's just that Eclipse doesn't like the code.

Par for course. Of course.

I can still use it to browse the code, and follow the relationships of all the objects. And indeed, one of the warnings that Eclipsed barfed up did indeed turn out to be a real bug (an unused variable that turned out should have been used). So that's good. But all the other warnings are bogus, as “fixing” them causes other errors. So I have to pretty much ignore all that, and just use Eclipse as a glorified version of more, only one that automagically cross references everything.

Oh, and it gets hopelessly confused when I checkout new versions from the source repository and have to manually tell Eclipse to reload the changed files, instead of having it just figure it out on its own.

It's comical, I tell you.

If that wasn't fun enough, I figured I try out the “C/C++ version” of Eclipse, if only as a code browser (since we do have some C++ code, and the call depth does make it rather difficult to follow using a more traditional, but less flaky, text editor). So I download that version. I'm still not quite sure what the “workspace” is, since when I point the “workspace” to the top level directory of our existing C/C++ codebase, it does nothing. No, I have to select a “new project” which is an “existing project,” none of which exactly matches what we have, but I select the one that most closely, but not exactly, matches what we have only to have Eclipse immediately wet its pants and dump core, all over the place.

Now, I thought Eclipse was written in Java, a managed language that produces not real machine code, but virtual code that is then emulated by a runtime engine—the whole “write once, debug run everywhere” schtick. How does that dump core? What's wrong Eclipse? You can't deal with 2,100 source code files?

Okay, what about something smaller? How about SPCDNS? It's C. There are only eight source files, only two of which, one code and one header file are absolutely required for the project. How about that?

Oh, I see you're still horribly confused from the previous 2,100 file codebase. Okay, I delete everything you touched, re-extract from the downloaded tarball and try again. Feel better? Should I lay out some newspaper in case you barf again? No? Okay.

Hmm. I still don't fully understand this business with “workspaces” but whatever. Here's the top level directory for SPCDNS. Oh, you can't find anything. Start over. Here's the source directory for SPCDNS. Ah, you like that. But you can't build, because the Makefile is missing.

Seriously. Eclipse. You can't deal with a Makefile one level up? Oh for crying out loud …

Start over. New project. Entirely new project. Oh look, one of the options is for autoconf. I've never bothered with that, but maybe Eclipse can show me a thing or two about … oh never mind, that's right. My Ubuntu install is now fourty minutes old and the installed autoconf might as well be in Sumarian for all you care, Eclipse.

Start over. New project. Makefile. GCC. New file. dns.h. Load it up in another text editor, select all, copy. Paste into Eclipse. Seriously, Eclipse? 600 errors? It's a XXXXXXX header file! You don't have to compile that! Okay, let me continue with the C code. Load codec.c into a text editor. Select all, copy, paste into a new file in Eclipse. Oh, now it's 1,234 errors? Oh, you don't like the restrict keyword … what? You don't understand C99? Don't worry, Mark doesn't care for C99 either, so you're in good company there, but … really?

Start over. New project. Pure C. Makefile. GCC. Check the options, ah, find where I can specify C99 on the command line. Select, copy, paste dns.h into Eclipse. 600 errors. Okay, okay, I'll include the XXXXXXX headers you want. Happy? Okay, on to codec.c. Two warnings this time, about two unused functions.

Really? Those are unused? Okay, I'll remove one of then, and the prototype and—

Eclipse?

Eclipse?

Where did you go?

You puke and dump core again?

You're written in XXXXXXX Java! You shouldn't be able to crash!

Bah!

I still hate IDEs.


The proper care of cast iron cookware

Bunny and I got into a heated discussion over the proper way to clean a cast iron skillet and in the end, I had to do some searching. So, for the record, the proper way to clean your cast iron cookware, and while we're at it, reseason (or recondition) your cast iron cookware (even if you think it's given up the ghost).


99 ways to program a hex, Part 4: C99

Today's variation: C99.

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: C99 */

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>

#define LINESIZE	  16

static void 	do_dump		(FILE *,FILE *);

/****************************************************************/

int main(int argc,char *argv[])
{
  if (argc == 1)
    do_dump(stdin,stdout);
  else
  {
    for (int i = 1 ; i < argc ; i++)
    {
      FILE *fp = fopen(argv[i],"rb");
      if (fp == NULL)
      {
        perror(argv[i]);
        continue;
      }

      printf("-----%s-----\n",argv[i]);
      do_dump(fp,stdout);
      fclose(fp);
    }
  }

  return EXIT_SUCCESS;
}

/******************************************************************/

static void do_dump(FILE *fpin,FILE *fpout)
{
  unsigned char  buffer[BUFSIZ];
  size_t         offset;
  size_t         bread;
  
  offset = 0;

  while((bread = fread(buffer,1,BUFSIZ,fpin)) > 0)
  {
    unsigned char *pbyte = buffer;

    while (bread > 0)
    {
      char ascii[LINESIZE + 1];
      
      fprintf(fpout,"%08lX: ",(unsigned long)offset);
      size_t j = 0;

      do
      {
        fprintf(fpout,"%02X ",*pbyte);
        if (isprint(*pbyte))
          ascii [j] = *pbyte;
        else
          ascii [j] = '.';
        pbyte  ++;
        offset ++;
        j      ++;
        bread  --;
      } while ((j < LINESIZE) && (bread > 0));

      ascii [j] = '\0';

      if (j < LINESIZE)
	for (size_t i = j ; i < LINESIZE ; i++) fprintf(fpout,"   ");

      fprintf(fpout,"%s\n",ascii);      
    }
    
    if (fflush(fpout) == EOF)
    {
      perror("output");
      exit(EXIT_FAILURE);
    }
  }
}

/***************************************************************/

It's not much different than the C89 version. The main difference is the ability to declare variables when needed instead of the beginning of a block of code. I don't particularly care for that feature, but I do like the ability to declare variables inside the for() statement, like I've done here.

Friday, January 13, 2012

99 ways to program a hex, Part 5: C99 in K&R style

Like part 3, today's version is C99 in the K&R style:

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: C99 in K&R style */

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>

#define LINESIZE	  16

static void 	do_dump		(FILE *,FILE *);

/****************************************************************/

int main(int argc,char *argv[]) {
	if (argc == 1) {
		do_dump(stdin,stdout);
	} else {
		for (int i = 1 ; i < argc ; i++) {
			FILE *fp = fopen(argv[i],"rb");
			if (fp == NULL) {
				perror(argv[i]);
				continue;
			}

			printf("-----%s-----\n",argv[i]);
			do_dump(fp,stdout);
			fclose(fp);
		}
	}

	return EXIT_SUCCESS;
}

/******************************************************************/

static void do_dump(FILE *fpin,FILE *fpout) {
	unsigned char buffer[BUFSIZ];
	size_t offset=0,bread;
  
	while((bread = fread(buffer,1,BUFSIZ,fpin)) > 0) {
		unsigned char *pbyte = buffer;

		while (bread > 0) {
			char ascii[LINESIZE + 1];
      
			fprintf(fpout,"%08lX: ",(unsigned long)offset);
			size_t j = 0;
			do {
				fprintf(fpout,"%02X ",*pbyte);
				if (isprint(*pbyte)) {
					ascii [j] = *pbyte;
				} else {
					ascii [j] = '.';
				}
				pbyte  ++;
				offset ++;
				j      ++;
				bread  --;
			} while ((j < LINESIZE) && (bread > 0));

			ascii [j] = '\0';

			if (j < LINESIZE) {
				for (size_t i = j ; i < LINESIZE ; i++) {
					fprintf(fpout,"   ");
				}
			}

			fprintf(fpout,"%s\n",ascii);      

			if (fflush(fpout) == EOF) {
				perror("output");
				exit(EXIT_FAILURE);
			}
		}
	}
}

/***************************************************************/

To the untrained eye, it probably looks like every other version I've presented here, yet there is is a difference, subtle as it may be. But even in the book that inspired this series there were plenty of examples that weren't all that much different.

At least, that's what I keep telling myself.

Saturday, January 14, 2012

99 ways to program a hex, Part 6: C89, “splint -strict” compliant

Back in the K&R days, C code tended to play rather loose with the rules. As a result, some pretty subtle bugs would go undetected, such as passing the wrong number of parameters to a function, the wrong type of parameters to a function, and ignoring the results of a function. Because of these types of errors, a program called lint was developed that could detect them, as well as other commonly made mistakes. In fact, lint was very fussy about the code it was given.

But it was a popular tool (I remember the ads for PC Lint that would show a snippit of C code that had a subtle bug that PC Lint could detect. I got good enough to spot the errors shown in the ads) and one could always tell code that's been through lint because of code like:

(void)printf("hello world\n");

The standard these days seems to be a program called splint and man, is it picky; just getting code to pass through splint is hard enough, but then there's the -strict option:

-strict
Absurdly strict checking. All checking done by checks, plus modifications and global variables used in unspecified functions, strict standard library, and strict typing of C operators. A special reward will be presented to the first person to produce a real program that produces no errors with strict checking.

Which brings us to today's code:

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: C89, "splint -strict" compliant */

#ifndef S_SPLINT_S
#  include <stdio.h>
#  include <ctype.h>
#  include <string.h>
#  include <stdlib.h>
#endif

#define LINESIZE	((size_t)16)

/*@-protoparamname@*/
static void do_dump(FILE *fpin,FILE *fpout)
	/*@globals fileSystem @*/
	/*@modifies *fpin, *fpout, fileSystem @*/
	;
/*@+protoparamname@*/

/****************************************************************/

int main(int argc,char *argv[])
/*@globals  fileSystem, stdin, stdout@*/
/*@modifies fileSystem, stdin, stdout@*/
{
  if (argc == 1)
  {
    do_dump(stdin,stdout);
  }
  else
  {
    int i;
    
    for (i = 1 ; i < argc ; i++)
    {
      FILE *fp;
      
      /*@-boundsread@*/
      fp = fopen(argv[i],"rb");
      /*@+boundsread@*/
      
      if (fp == NULL)
      {
        perror(argv[i]);
        continue;
      }

      printf("-----%s-----\n",argv[i]);
      do_dump(fp,stdout);
      if (fclose(fp) == EOF)
      {
        perror(argv[i]);
      }
    }
  }

  return EXIT_SUCCESS;
}

/******************************************************************/

static void do_dump(FILE *fpin,FILE *fpout)
/*@globals fileSystem @*/
/*@modifies *fpin, *fpout, fileSystem@*/
{
  unsigned char  buffer[BUFSIZ];
  unsigned char *pbyte;
  size_t         offset;
  size_t         bread;
  size_t         j;
  char           ascii[LINESIZE + 1];
  
  offset = 0;

  while((bread = fread(buffer,(size_t)1,BUFSIZ,fpin)) > 0)
  {
    pbyte = buffer;
    while (bread > 0)
    {
      fprintf(fpout,"%08lX: ",(unsigned long)offset);
      j = 0;
      do
      {
        fprintf(fpout,"%02X ",(unsigned int)*pbyte);
        if (isprint(*pbyte))
        {
          ascii [j] = (char)*pbyte;
        }
        else
        {
          ascii [j] = '.';
        }
        pbyte  ++;
        offset ++;
        j      ++;
        bread  --;
      } while ((j < LINESIZE) && (bread > 0));
      ascii [j] = '\0';
      if (j < LINESIZE)
      {
	size_t i;

	for (i = j ; i < LINESIZE ; i++) 
	{
	  fprintf(fpout,"   ");
	}
      }
      fprintf(fpout,"%s\n",ascii);      
    }
    
    if (fflush(fpout) == EOF)
    {
      perror("output");
      exit(EXIT_FAILURE);
    }
  }
}

/***************************************************************/

I'm actually surprised at just how few splint directives I needed (they're those funny looking comments like /*@-frobnitz@*/) to get this code through splint -strict. The only hard part was the function prototype—it didn't matter if I included the parameter names:

Splint 3.1.2 --- 07 Dec 2009

06.c:34:27: Declaration parameter has name: fpin
  A parameter in a function prototype has a name.  This is dangerous, since a
  macro definition could be visible here. (Use either -protoparamname or
  -namechecks to inhibit warning)
06.c:34:38: Declaration parameter has name: fpout
  A parameter in a function prototype has a name.  This is dangerous, since a
  macro definition could be visible here. (Use either -protoparamname or
  -namechecks to inhibit warning)

Finished checking --- 2 code warnings

or not:

Splint 3.1.2 --- 07 Dec 2009

06.c:36:15: Unrecognized identifier in modifies comment: fpin
  Identifier used in code has not been declared. (Use -unrecog to inhibit
  warning)
06.c:36:22: Unrecognized identifier in modifies comment: fpout
sRef.c:1369: at source point
06.c:47:26: *** Internal Bug at sRef.c:1369: llassert failed:
               sRef_isReasonable (s) [errno: 25]
     *** Please report bug to splint-bug@splint.org ***
       (attempting to continue, results may be incorrect)
*** Segmentation Violation
*** Location (not trusted): 06.c:47:26
*** Last code point: exprNode.c:3046
*** Previous code point: exprNode.c:10317
*** Please report bug to splint-bug@splint.org
*** A useful bug report should include everything we need to reproduce the bug.

(and it crashes! Woot!)

splint bitched about the prototype. I could have rearranged the code so the prototype was unnecessary, but I decided to shut that particular error up with the /*@-protoparamname@*/ ... /*@+protoparamname@*/ directives. But really, other than that and one other minor bitch, the code passed splint -strict rather easily.

I wonder if I can claim that prize, or is the program too simple?

Sunday, January 15, 2012

99 ways to program a hex, Part 7: C89, const correctness

Standardization to C brought with it a way to annotate variables other than its type: how it is to be accessed. volatile informs the compiler that the value cannot be cached and must always be read from when referenced, because some outside agent (hardware, another process or thread) could have changed the contents since the last read, and const, which marks a variable as “read-only,” which means the value can be heavily cached as it won't change what-so-ever.

So today's code is the base version (which is C89), but with “const correctness.”

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: C89, const correctness */

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>

#define LINESIZE	  16

static void 	do_dump		(FILE *const,FILE *const);

/****************************************************************/

int main(const int argc,char *const argv[])
{
  if (argc == 1)
    do_dump(stdin,stdout);
  else
  {
    int i;
    
    for (i = 1 ; i < argc ; i++)
    {
      FILE *fp;
      
      fp = fopen(argv[i],"rb");
      if (fp == NULL)
      {
        perror(argv[i]);
        continue;
      }

      printf("-----%s-----\n",argv[i]);
      do_dump(fp,stdout);
      fclose(fp);
    }
  }

  return EXIT_SUCCESS;
}

/******************************************************************/

static void do_dump(FILE *const fpin,FILE *const fpout)
{
  unsigned char  buffer[BUFSIZ];
  unsigned char *pbyte;
  size_t         offset;
  size_t         bread;
  size_t         j;
  char           ascii[LINESIZE + 1];
  
  offset = 0;

  while((bread = fread(buffer,1,BUFSIZ,fpin)) > 0)
  {
    pbyte = buffer;
    while (bread > 0)
    {
      fprintf(fpout,"%08lX: ",(unsigned long)offset);
      j = 0;
      do
      {
        fprintf(fpout,"%02X ",*pbyte);
        if (isprint(*pbyte))
          ascii [j] = *pbyte;
        else
          ascii [j] = '.';
        pbyte  ++;
        offset ++;
        j      ++;
        bread  --;
      } while ((j < LINESIZE) && (bread > 0));
      ascii [j] = '\0';
      if (j < LINESIZE)
      {
	size_t i;

	for (i = j ; i < LINESIZE ; i++) fprintf(fpout,"   ");
      }
      fprintf(fpout,"%s\n",ascii);      
    }
    
    if (fflush(fpout) == EOF)
    {
      perror("output");
      exit(EXIT_FAILURE);
    }
  }
}

/***************************************************************/

There're no real volatile variables, so there's no use of volatile, but the use of const ensures that I don't change variables inadvertently. One thing to note: The following:

const int *pi;

creates a pointer that can change, which points to memory (interpreted as an integer) that can't change, while:

int *const pi;

creates a pointer that can't change, which points to memory (interpreted as an integer) that can change, while:

const int *const pi;

creates a pointer that can't change, which points to memory (interpreted as an integer) that can't change.

Yes, there are some subtle differences there, and it took me a while to get it down, but you can pin down what can and can't change.

Monday, January 16, 2012

99 ways to program a hex, Part 8: C99, const and restrict correctness

Much like the difference between part 1 and part 4, there is very little difference between today's code and yesterday's code:

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: C99, const and restrict correctness */

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>

#define LINESIZE	  16

static void 	do_dump		(FILE *const restrict,FILE *const restrict);

/****************************************************************/

int main(const int argc,char *const restrict argv[])
{
  if (argc == 1)
    do_dump(stdin,stdout);
  else
  {
    for (int i = 1 ; i < argc ; i++)
    {
      FILE *fp = fopen(argv[i],"rb");
      if (fp == NULL)
      {
        perror(argv[i]);
        continue;
      }

      printf("-----%s-----\n",argv[i]);
      do_dump(fp,stdout);
      fclose(fp);
    }
  }

  return EXIT_SUCCESS;
}

/******************************************************************/

static void do_dump(FILE *const restrict fpin,FILE *const restrict fpout)
{
  unsigned char  buffer[BUFSIZ];
  size_t         offset;
  size_t         bread;
  
  offset = 0;

  while((bread = fread(buffer,1,BUFSIZ,fpin)) > 0)
  {
    unsigned char *pbyte = buffer;

    while (bread > 0)
    {
      char ascii[LINESIZE + 1];
      
      fprintf(fpout,"%08lX: ",(unsigned long)offset);
      size_t j = 0;

      do
      {
        fprintf(fpout,"%02X ",*pbyte);
        if (isprint(*pbyte))
          ascii [j] = *pbyte;
        else
          ascii [j] = '.';
        pbyte  ++;
        offset ++;
        j      ++;
        bread  --;
      } while ((j < LINESIZE) && (bread > 0));

      ascii [j] = '\0';

      if (j < LINESIZE)
	for (size_t i = j ; i < LINESIZE ; i++) fprintf(fpout,"   ");

      fprintf(fpout,"%s\n",ascii);      
    }
    
    if (fflush(fpout) == EOF)
    {
      perror("output");
      exit(EXIT_FAILURE);
    }
  }
}

/***************************************************************/

C99 adds restrict to the ways one can modify the access to a variable. The rational behind this is a bit esoteric—it tells the compiler that a pointer is the only pointer to a block of memory.

Yes, it does seem odd to have to add a keyword for that, but it does help with code optimization. For instance, the following (silly) function:

int foo(int *p1,int *p2)
{
  *p2 =  *p1 * 17;
  return *p1 * 17;
}

The problem here is that the compiler has to do the multiplication twice, as p2 could be pointing to the same location as p1, and thus, the contents pointed to by p1 could be modified. So the compiler is forced to write machine code like:

foo:		mov	ebx,[esp + 4]	; get p1
		mov	eax,[ebx]	; read *p1
		imul	eax,17		; multiply
		mov	edx,[esp + 8]	; get p2
		mov	[edx],eax	; save results in *p2
		mov	eax,[ebx]	; read *p1
		imul	eax,17		; multiply
		ret

Not exactly optimum, but the C compiler is constrained because of the semantics of pointers in C. Change the C code a bit:

int foo(int *restrict p1,int *restrict p2)
{
  *p2 =  *p1 * 17;
  return *p1 * 17;
}

And the compiler can now produce:

foo:		mov	ebx,[esp + 4]	; get p1
		mov	eax,[ebx]	; read *p1
		imul	eax,17		; multiply
		mov	edx,[esp + 8]	; save
		mov	[edx],eax	; return result
		ret

Okay, it's only a savings of two instructions (plus an an additional read) but when you're trying to multiply huge matrices, it can add up.


Notes on a conversation while driving to dinner

“It's been a long time since I've seen Richard Chamberlain on television,” she said. “I didn't recognize him.”

“You didn't recognize a seven foot black man?” he asked.

“Darling,” she said, “you're thinking of Wilt Chamberlain. I'm talking about the actor, Richard Chamberlain.”

“Oh,” he said. “So you didn't recognize a five-five white guy then? Ouch!”

Tuesday, January 17, 2012

99 ways to program a hex, Part 9: C89, const correctness, assertive

This is a minor variation on part 7—the use of assert():

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: C89, const correctness, assertive */

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>

#define LINESIZE	  16

static void 	do_dump		(FILE *const,FILE *const);

/****************************************************************/

int main(const int argc,char *const argv[])
{
  assert(argc    >= 1);
  assert(argv    != NULL);
  assert(argv[0] != NULL);
  
  if (argc == 1)
    do_dump(stdin,stdout);
  else
  {
    int i;
    
    for (i = 1 ; i < argc ; i++)
    {
      FILE *fp;
      
      fp = fopen(argv[i],"rb");
      if (fp == NULL)
      {
        perror(argv[i]);
        continue;
      }

      printf("-----%s-----\n",argv[i]);
      do_dump(fp,stdout);
      fclose(fp);
    }
  }

  return EXIT_SUCCESS;
}

/******************************************************************/

static void do_dump(FILE *const fpin,FILE *const fpout)
{
  unsigned char  buffer[BUFSIZ];
  unsigned char *pbyte;
  size_t         offset;
  size_t         bread;
  size_t         j;
  char           ascii[LINESIZE + 1];
  
  assert(fpin  != NULL);
  assert(fpout != NULL);
  
  offset = 0;

  while((bread = fread(buffer,1,BUFSIZ,fpin)) > 0)
  {
    pbyte = buffer;
    while (bread > 0)
    {
      fprintf(fpout,"%08lX: ",(unsigned long)offset);
      j = 0;
      do
      {
        fprintf(fpout,"%02X ",*pbyte);
        if (isprint(*pbyte))
          ascii [j] = *pbyte;
        else
          ascii [j] = '.';
        pbyte  ++;
        offset ++;
        j      ++;
        bread  --;
      } while ((j < LINESIZE) && (bread > 0));
      ascii [j] = '\0';
      if (j < LINESIZE)
      {
	size_t i;

	for (i = j ; i < LINESIZE ; i++) fprintf(fpout,"   ");
      }
      fprintf(fpout,"%s\n",ascii);      
    }
    
    if (fflush(fpout) == EOF)
    {
      perror("output");
      exit(EXIT_FAILURE);
    }
  }
}

/***************************************************************/

Writing Solid Code is one of only two programming books that really change how I write code (the other being Thinking Forth but that's for another episode post), begining with the liberal use of assert() to, well, not validate input parameters, but to enforce that they're valid.

Prior to this book, I wrote defensive code, so prior to reading the book, I would have coded do_dump() as:

static void do_dump(FILE *const fpin,FILE *const fpout)
{
  /* vars vars vars */

  if ((fpin == NULL) || (fpout == NULL))
    return;

  /* rest of code */
}

Not very much code (and in this code, useless as well), but in a larger codebase, it does add up. And it hides problems with the code. The first project I liberally used assert() I really went crazy with it. The codebase implemented “window regions” on a text screen, and every routine used assert() to not only check that I didn't slip in a NULL pointer, but that every field of all the structures I defined had reasonable values.

And doing so saved me a lot of debugging time in the corner cases, like, what exactly does it mean to have a “window” that's only one character wide? Or even a window that's one character wide by one line high? The assert()s would trip up on all sorts of corner cases like this, and given that I was programming the code under MS-DOS, an errant pointer could not only crash the program, but the entire machine (at best—at worst, it could corrupt memory that wouldn't be detected until some other program ran).

I still use assert()s to this day.

Now, I'll grant you the following bit of code:

int main(const int argc,char *const argv[])
{
  assert(argc    >= 1);
  assert(argv    != NULL);
  assert(argv[0] != NULL);

is going a bit too far, only because this is guaranteed to be true by the C standard, and if it's not, I have more pressing issues to worry about.


The network? A firewall? A bug? A misconfiguration? Gremlins? Who knows?

“In a system of a million parts, if each part malfunctions only one time out of a million, a breakdown is certain.”

—Stanislaw Lem

So it's Regression Test Time™ again (for “Project: Wolowizard”) at The Ft. Lauderdale Office of the Corporation, only this time, with new, addtional regression tests!

Joy.

Okay, it's not too bad. It's a rather simple matter to add the cases to a master list of test cases and expand the program that uses this list to generate the data used for the regression test. That was probably about an hour or so of work. Then a minor change to the actual test program to make sure it fires off the messages under the right conditions (two different messages, ten cases, a 100×100 matrix, but easy enough to code).

Then, generate all the data, copy it all out to the four servers required to run the test, get the latest build of all the programs, move them out to the test servers, make sure the configuration files are up to date on all the servers, make sure The Protocol Stack From Hell™ won't puke, and fire up the regression test.

Only to have one component fail each test because it can't communicate with another component.

Aaaaaarg!

SM and I spent the next few hours troubleshooting the issue. The two components are on different servers, but they can see each other. Doing a manual query at the command line shows the query going through. But something deep within the bowels (maybe below the cockles, maybe in the sub-cockle area, maybe in the liver, maybe in the kidneys, maybe even in the colon. We don't know …) of “Project: Wolowizard” is munged.

Sigh.

Wednesday, January 18, 2012

99 ways to program a hex, Part 10: C99, const and restrict correctness, assertive

It's pretty much the same as yesterday's version, only in C99:

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: C99, const and restrict correctness, assertive */

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>

#define LINESIZE	  16

static void 	do_dump		(FILE *const restrict,FILE *const restrict);

/****************************************************************/

int main(const int argc,char *const restrict argv[])
{
  assert(argc    >= 1);
  assert(argv    != NULL);
  assert(argv[0] != NULL);
  
  if (argc == 1)
    do_dump(stdin,stdout);
  else
  {
    for (int i = 1 ; i < argc ; i++)
    {
      FILE *fp = fopen(argv[i],"rb");
      if (fp == NULL)
      {
        perror(argv[i]);
        continue;
      }

      printf("-----%s-----\n",argv[i]);
      do_dump(fp,stdout);
      fclose(fp);
    }
  }

  return EXIT_SUCCESS;
}

/******************************************************************/

static void do_dump(FILE *const restrict fpin,FILE *const restrict fpout)
{
  unsigned char  buffer[BUFSIZ];
  size_t         offset;
  size_t         bread;
  
  assert(fpin  != NULL);
  assert(fpout != NULL);
  
  offset = 0;

  while((bread = fread(buffer,1,BUFSIZ,fpin)) > 0)
  {
    unsigned char *pbyte = buffer;

    while (bread > 0)
    {
      char ascii[LINESIZE + 1];
      
      fprintf(fpout,"%08lX: ",(unsigned long)offset);
      size_t j = 0;

      do
      {
        fprintf(fpout,"%02X ",*pbyte);
        if (isprint(*pbyte))
          ascii [j] = *pbyte;
        else
          ascii [j] = '.';
        pbyte  ++;
        offset ++;
        j      ++;
        bread  --;
      } while ((j < LINESIZE) && (bread > 0));

      ascii [j] = '\0';

      if (j < LINESIZE)
	for (size_t i = j ; i < LINESIZE ; i++) fprintf(fpout,"   ");

      fprintf(fpout,"%s\n",ascii);      
    }
    
    if (fflush(fpout) == EOF)
    {
      perror("output");
      exit(EXIT_FAILURE);
    }
  }
}

/***************************************************************/

Not much to say about today's version.


It was a misconfiguration

Yesterday's problem? It turned out to be a misconfiguration. Or rather, the configuration file format changed enough to break the configuration files checked in for regression testing.

Sometime since the last regression test, parameters that deal with time can now take a suffix to denote the time unit being used (for example, “9s” for 9 seconds, or “3d” for 3 days) and the base unit for non-suffixed values changed (from “seconds” to “milliseconds” I'm guessing) so what was once configured to time out in 15 seconds would now timeout in 15 millisecconds, and thus, the one component would think the other side timed out.

I saw the initial changes, but I neglected to update a few key parameters properly. It's an easy thing to miss (as it took me two tries to change all the affected parameters).

Sigh.

But that aside, the regression test finally ran (well, it's still running—it takes hours for the thing to run).


A bit about buggy whips

I haven't talked much about SOPA and PIPA but I've been aware of them for some time. In fact, on the sites I normally travel, it was hard not to come across it. And then today … everybody by now has heard of SOPA.

And yes, it's bad. But why? It could be that there are a lot of people with money that want bits to have color, when bits have no color:

Bits do not naturally have Colour. Colour, in this sense, is not part of the natural universe. Most importantly, you cannot look at bits and observe what Colour they are. I encountered an amusing example of bit Colour recently: one of my friends was talking about how he'd performed John Cage's famous silent musical composition 4′33″ for MP3. Okay, we said, (paraphrasing the conversation here) so you took an appropriate-sized file of zeroes out of /dev/zero and compressed that with an MP3 compressor? No, no, he said. If I did that, it wouldn't really be 4′33″ because to perform the composition, you have to make the silence in a certain way, according to the rules laid down by the composer. It's not just four minutes and thirty-three seconds of any old silence.

My friend had gone through an elaborate process that basically amounted to performing some other piece of music four minutes and thirty-three seconds long, with a software synthesizer and the volume set to zero. The result was an appropriate-sized file of zeroes—which he compressed with an MP3 compressor. The MP3 file was bit-for-bit identical to one that would have been produced by compressing /dev/zero … but this file was (he claimed) legitimately a recording of 4′33″ and the other one wouldn't have been. The difference was the Colour of the bits. He was asserting that the bits in his copy of 433.mp3 had a different Colour from those in a copy of 433.mp3 I might make by means of the /dev/zero procedure, even though the two files would contain exactly the same bits.

Now, the preceding paragraph is basically nonsense to computer scientists or anyone with a mathematical background. (My friend is one; he'd done this as a sort of elaborate joke.) Numbers are numbers, right? If I add 39 plus 3 and get 42, and you do the same thing, there is no way that “my” 42 can be said to be different from “your” 42. Given two bit-for-bit identical MP3 files, there is no meaningful (to a computer scientist) way to say that one is a recording of the Cage composition and the other one isn't. There would be no way to test one of the files and see which one it was, because they are actually the same file. Having identical bits means by definition that there can be no difference. Bits don't have Colour; computer scientists, like computers, are Colour-blind. That is not a mistake or deficiency on our part: rather, we have worked hard to become so. Colour-blindness on the part of computer scientists helps us understand the fact that computers are also Colour-blind, and we need to be intimately familiar with that fact in order to do our jobs.

The trouble is, human beings are not in general Colour-blind. The law is not Colour-blind. It makes a difference not only what bits you have, but where they came from. There's a very interesting Web page illustrating the Coloured nature of bits in law on the US Naval Observatory Web site. They provide information on that site about when the Sun rises and sets and so on … but they also provide it under a disclaimer saying that this information is not suitable for use in court. If you need to know when the Sun rose or set for use in a court case, then you need an expert witness—because you don't actually just need the bits that say when the Sun rose. You need those bits to be Coloured with the Colour that allows them to be admissible in court, and the USNO doesn't provide that. It's not just a question of accuracy - we all know perfectly well that the USNO's numbers are good. It's a question of where the numbers came from. It makes perfect sense to a lawyer that where the information came from is important, in fact maybe more important than the information itself. The law sees Colour.

What Colour are your bits? - Ansuz - mskala's home page

Or maybe it goes deeper than that—that the Inernet is such a disruptive technology that it threatens all sorts of industries, not only because bits have no color, but that it democratizes the means of global mass production, and that may scare some people more than the colorless bits:

What's different now is that distribution costs have disappeared. Suddenly, hobbyists have the same reach as businesses and are seen as real competition. Unfortunately, hobbyists don't distribute for the same reasons and don't play by the same rules. That's a fundamental problem.

A business is run for money, even if it does creative things. It has expenses and investments. It has a physical location and distribution channels. A business has to play by the rules in order to keep earning money, and because they are vulnerable—to lawsuits, regulations, taxes and police.

A hobbyist is doing it for love, not money. He has almost no expenses— just put your music up on YouTube and promote it online, all for free. Since there is no monetary investment, no payroll, no building, no sales channel, the hobbyist does not have a lot to lose.

If a business breaks the law, it can be sued or a government can close it down. There aren't that many businesses in a given field, so it's relatively easy to police them. There are millions of hobbyists and they require no money to do their thing. Even if you sue them, you can't recover your costs because they have no money. And there are too many to shut them down individually.

On top of that, the internet is global, so many of the people a business wants to sue or arrest aren't even within its jurisdiction. The internet didn't just drop distribution costs, it made it possible to evade restrictive laws passed to protect publishers.

Viewing this as hobbyists vs. businesses makes a difference. The current story from publishers is that everything was fine until the internet came along and pirates started to steal all their products. The reality is that it's not just about piracy.

Hobbyists have always been there, creating art, music, books, comics, open source software, etc. The internet has just forced these two worlds into collision. Even if all the piracy disappeared, publishers would still be in trouble.

Part 47: Intellectual Property

Whatever the case, SOPA/PIPA is bad, and should be rejected by the United States govevernment. It's bad enough that I have to keep buying all these damned buggy whips when it's clear that the future is going to be these horseless carriages I keep hearing about.

Thursday, January 19, 2012

They're hard problems for a reason …

I quoted this once before, but it bears repeating:

There are only two hard problems in Computer Science: cache invalidation, naming things, and off-by-one errors.

I got hit with one of those issues today, and it wasn't naming a thing, or an off-by-one error. It'a amazing how much time can be wasted by omitting a single command …


99 ways to program a hex, Part 11: C89, const correctness, assertive, GCC extensions

Today's code is identical to part 9 save for one line—it uses a GCC extention to notify the compiler that the do_dump() function accepts NULL pointers, and that it won't throw any exceptions.

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: C89, const correctness, assertive, GCC extensions */

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>

#define LINESIZE	  16

static void 	do_dump		(FILE *const,FILE *const) __attribute__((nonnull,nothrow));

/****************************************************************/

int main(const int argc,char *const argv[])
{
  assert(argc    >= 1);
  assert(argv    != NULL);
  assert(argv[0] != NULL);
  
  if (argc == 1)
    do_dump(stdin,stdout);
  else
  {
    int i;
    
    for (i = 1 ; i < argc ; i++)
    {
      FILE *fp;
      
      fp = fopen(argv[i],"rb");
      if (fp == NULL)
      {
        perror(argv[i]);
        continue;
      }

      printf("-----%s-----\n",argv[i]);
      do_dump(fp,stdout);
      fclose(fp);
    }
  }

  return EXIT_SUCCESS;
}

/******************************************************************/

static void do_dump(FILE *const fpin,FILE *const fpout)
{
  unsigned char  buffer[BUFSIZ];
  unsigned char *pbyte;
  size_t         offset;
  size_t         bread;
  size_t         j;
  char           ascii[LINESIZE + 1];
  
  assert(fpin  != NULL);
  assert(fpout != NULL);
  
  offset = 0;

  while((bread = fread(buffer,1,BUFSIZ,fpin)) > 0)
  {
    pbyte = buffer;
    while (bread > 0)
    {
      fprintf(fpout,"%08lX: ",(unsigned long)offset);
      j = 0;
      do
      {
        fprintf(fpout,"%02X ",*pbyte);
        if (isprint(*pbyte))
          ascii [j] = *pbyte;
        else
          ascii [j] = '.';
        pbyte  ++;
        offset ++;
        j      ++;
        bread  --;
      } while ((j < LINESIZE) && (bread > 0));
      ascii [j] = '\0';
      if (j < LINESIZE)
      {
	size_t i;

	for (i = j ; i < LINESIZE ; i++) fprintf(fpout,"   ");
      }
      fprintf(fpout,"%s\n",ascii);      
    }
    
    if (fflush(fpout) == EOF)
    {
      perror("output");
      exit(EXIT_FAILURE);
    }
  }
}

/***************************************************************/

Okay, what failed this time?

I'm running the regression tests for “Project: Wolowizard” and about half way through the tests (around the two hour mark or so) start failing. Sometimes expected results just aren't showing up. I'm freaking about a bit because of all the issues we've had in running these tests, only for it to start failing in yet a different way.

Now, a bit about how this all works—there are four computers involved; one runs the tests, injecting messages towards a mini-cluster of two machines, either of which (depending on which one gets the message) sends a message to the fourth machine, which does a bunch of processing (which may involve interaction with a simulated cell phone on the testing machine), then responds back to the mini-cluster, which then responds back to the testing machine.

Now, I can check the immedate results from the mini-cluster, but the actual data I'm interested in is logged via syslog, so I have that data forwarded to the testing machine and my code grovels through a log file for the actual data I want. And it's that data (or part thereof) that apparently isn't being logged, and thus, the tests are failing.

Now, it just so happens that the part of the test that's failing is the part dealing with the mini-cluster, and it looks like about half the tests are failing (hmm …. ).

I log into each of the two computers comprising the mini-cluster, and check /etc/syslog.conf, in the off chance that changed. Nope. I then explain the problem to Bunny, standing (or rather, sitting) in as my cardboard programmer when it hits me—I should check to see if the program is running.

Rats. It is.

The tests are still failing, and my shoes began to squeak.

Okay, just because syslogd is running doesn't necessarily mean it's running correctly. So I run logger -p local1.info FOO on each machine and yes, one of the machines is failing to foward the logs to the testing machine.

Ahah!

I restart syslogd on that system, and lo! The log entries are getting through now.

You know, I expect there to be issues with the stuff I'm testing; what I don't expect is the stuff that we didn't write is having issues (the Protocol Stack From Hell™ notwithstanding).

Okay, reset everything and start the regression test over again …

Update in the wee-hours of the morning, Friday, January 20th, 2012

A bit over half-way through the regression tests, and the log files rotate. Aaaaaaaaaah! Okay, reset all the data, and start from the last failed test. That's easy, since I can specify which cases to run. That's hard, because I have to specify nearly a 100 cases. That's easy, since I can use the Unix command seq to list them. That's hard, because the test cases aren't just numbers, but things like “1.b.77” and “1.c.18”, and while the shell supports command line expantion from a running program via the backtick (ala for i in `seq 34 77`; do echo 1.b.$i; done) I need to nest two such operations (echo `for i in `seq 34 77`;do echo 1.b.$i; done`) to specify the test cases from the command line, and the command line doesn't support that. Okay, I can create a temporary file that lists the test cases …

Friday, January 20, 2012

Notes about an overheard conversation between a blogger and his lovely and talented copy editor

“I think you miscounted.”

“Here?”

“Yes. There are only two problems … cache invalidation … ”

“Uh huh.”

“Naming things … ”

“Yeah … ”

“and off-by-one errors … oh. Oh! D'oh!”


99 ways to program a hex, Part 12: C99, const and restrict correctness, assertive, GCC extenstions

And today's code is identical to part 10 save again for one line—it too uses a GCC extension per yesterday's code.

This also marks the end of the C versions for a couple of days.

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: C99, const and restrict correctness, assertive, GCC extenstions */

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>

#define LINESIZE	  16

static void 	do_dump		(FILE *const restrict,FILE *const restrict) __attribute__((nonnull,nothrow));

/****************************************************************/

int main(const int argc,char *const restrict argv[])
{
  assert(argc    >= 1);
  assert(argv    != NULL);
  assert(argv[0] != NULL);
  
  if (argc == 1)
    do_dump(stdin,stdout);
  else
  {
    for (int i = 1 ; i < argc ; i++)
    {
      FILE *fp = fopen(argv[i],"rb");
      if (fp == NULL)
      {
        perror(argv[i]);
        continue;
      }

      printf("-----%s-----\n",argv[i]);
      do_dump(fp,stdout);
      fclose(fp);
    }
  }

  return EXIT_SUCCESS;
}

/******************************************************************/

static void do_dump(FILE *const restrict fpin,FILE *const restrict fpout)
{
  unsigned char  buffer[BUFSIZ];
  size_t         offset;
  size_t         bread;
  
  assert(fpin  != NULL);
  assert(fpout != NULL);
  
  offset = 0;

  while((bread = fread(buffer,1,BUFSIZ,fpin)) > 0)
  {
    unsigned char *pbyte = buffer;

    while (bread > 0)
    {
      char ascii[LINESIZE + 1];
      
      fprintf(fpout,"%08lX: ",(unsigned long)offset);
      size_t j = 0;

      do
      {
        fprintf(fpout,"%02X ",*pbyte);
        if (isprint(*pbyte))
          ascii [j] = *pbyte;
        else
          ascii [j] = '.';
        pbyte  ++;
        offset ++;
        j      ++;
        bread  --;
      } while ((j < LINESIZE) && (bread > 0));

      ascii [j] = '\0';

      if (j < LINESIZE)
	for (size_t i = j ; i < LINESIZE ; i++) fprintf(fpout,"   ");

      fprintf(fpout,"%s\n",ascii);      
    }
    
    if (fflush(fpout) == EOF)
    {
      perror("output");
      exit(EXIT_FAILURE);
    }
  }
}

/***************************************************************/

Saturday, January 21, 2012

99 ways to program a hex, Part 13: COLOR COMPUTER BASIC, EASY

I decided to take a break with the C versions for a few days, given that a) I'm hopelessly behind on posting, and b) the next few versions are interesting and I want to make sure I have plenty of time for the write-ups. So I went back to my programming roots, to the first computer I ever owned, a Tandy Color Computer 2, and figured, why not do a hex dump program for it? In its version of BASIC.

One of the rules I set for myself is that the output of each version should match if at all possible, and I'm afraid that this version (and the next one) fall under those weasle words—the output won't be exactly the same. It's not hard to understand why though, when you realize that the text screen of the Color Computer is 32×16. Yes, it's only sixteen lines of 32 characters each. Yeah, this:

00000170: 44 4F 53 20 42 41 53 49 43 20 54 4F 20 52 55 4E DOS BASIC TO RUN
00000180: 2E 0A 31 36 27 0A 32 30 20 50 52 49 4E 54 20 22 ..16'.20 PRINT "
00000190: 49 4E 50 55 54 20 46 49 4C 45 20 4E 41 4D 45 3A INPUT FILE NAME:
000001A0: 22 3B 0A 32 31 20 49 4E 50 55 54 20 46 4E 24 0A ";.21 INPUT FN$.
000001B0: 32 32 20 4F 50 45 4E 22 49 22 2C 23 31 2C 46 4E 22 OPEN"I",#1,FN

ain't gonna fit!

This, however:

0170 444F532042415349 DOS BASI
0178 4320544F2052554E C TO RUN
0180 2E0A3136270A3230 ..16'.20
0188 205052494E542022  PRINT "
0190 494E505554204649 INPUT FI
0198 4C45204E414D453A LE NAME:
01A0 223B0A323120494E ";.21 IN
01A8 50555420464E240A PUT FN$.
01B0 3232204F50454E22 22 OPEN"
01B8 49222C23312C464E I",#1,FN

would. And we'll be going with that.

And I'm presenting the code as it would (or less) appear on the Color Computer, wrapped at 32 columns (more accurate would be a mid-range green background with a really crappy 8×12 pixel font).

1 '*****************************
2 '* COPYRIGHT 2012 SEAN CONNER
3 '* 
4 '* THIS PROGRAM RELEASED UNDER
5 '* THE GNU LICENSE, VERSION 2
6 '* OF THE LICENSE, OR (AT YOUR
7 '* OPTION) ANY LATER VERSION.
8 '*
9 '* SEE THE GNU GENERAL PUBLIC
10 '* LICENCE FOR MORE DETAILS.
11 '***************************
12 '
13 'COLOR COMPUTER BASIC, EASY
14 'REQUIRES EXTENDED AND
15 'DOS BASIC TO RUN.
16'
20 PRINT "INPUT FILE NAME:";
21 INPUT FN$
22 OPEN"I",#1,FN$
23 I=0
24 Y=0
30 FOR S=1TOLOC(2)
31 GET#1,S
32 D$=INPUT#1
100 FOR B = 1 TO 256 STEP 8
101 DD$ = MID$(D$,B,16)
102 I$=HEX$(I)
103 IF LEN(I$)<>4 THEN I$ = MID$
("000",1,4-LEN(I$))+I$
104 H$ = ""
105 A$ = ""
106 FOR C = 1 TO 8
107 B$ = MID$(H$,C,1)
108 IF CHR$(B$)<32 OR CHR$(B$)>1
26 THEN B$="."
109 A$=A$+B$
110 T$=HEX$(B$)
111 IF LEN(T$)=1 THEN T$="0"+T$
112 H$=H$+T$
113 NEXT C
114 PRINTI$;": ";H$;" ";A$
115 I=I+8
116 Y=Y+1
117 IF Y=15 THEN INPUT T$:Y=1
118 NEXT B
119 NEXT S
120 CLOSE#1
130 END

Now, I don't know if the code actually works (unlike the previous twelve versions). I still have my first computer, but I haven't turned it on in years, and in fact, I would have to dig it out of storage, find a TV that worked to hook it up to (for a video display), then type in the program, debug it, then … um … type it in again, since I don't have any easy way of transferring data off the Color Computer (that would require more digging to piles of cables to find the right set of cables and adaptors, going from a 4-pin DIN to USB with some form of null-modem cable thrown in). So in theory the code works, but in practice …

The main problem is that the DOS BASIC commands are geared towards record based files (both sequential and random access) and not the more modern “stream of bytes” paradigm in use today. I'm reading what I hope are 256 byte binary records. That's where my biggest concern lies really.

Now, I could read in binary data directly; there is a command to do that, but it reads in a raw sector from the disk; I would have to write code to decode the actual file structure. It's not a complicated structure, but it's a bit more effort than I want do go into right now, and would distract from a “simple” program.

So I can only hope that the program presented above works.

Sunday, January 22, 2012

99 ways to program a hex, Part 14: COLOR COMPUTER BASIC

Yesterday's code was labeled “COLOR COMPUTER BASIC, EASY” not because it was easy to write (it was somewhat easy—seeing how I didn't have to run it, and going off referenece material for a language I haven't used in over twenty years) but because it was relatively “easy” to read.

I'm being serious.

I never saw any published BASIC code look that nice. No, it would usually be presented as this (but probably without the blank lines):

1 '*****************************
2 '* COPYRIGHT 2012 SEAN CONNER
3 '* 
4 '* THIS PROGRAM RELEASED UNDER
5 '* THE GNU LICENSE, VERSION 2
6 '* OF THE LICENSE, OR (AT YOUR
7 '* OPTION) ANY LATER VERSION.
8 '*
9 '* SEE THE GNU GENERAL PUBLIC
10 '* LICENCE FOR MORE DETAILS.
11 '***************************
12 '
13 'COLOR COMPUTER BASIC
14 'REQUIRES EXTENDED AND
15 'DOS BASIC TO RUN.
16'
20 PRINT "INPUT FILE NAME:";:INP
UT FN$:OPEN"I",#1,FN$:I=0:Y=0:FO
R S=1TOLOC(2):GET#1,S:D$=INPUT#1
:FOR B=1TO256STEP8:DD$=MID$(D$,B
,16):I$=HEX$(I):IF LEN(I$)<>4 TH
EN I$ = MID$("000",1,4-LEN(I$))+
I$
21 H$="":A$="":FOR C=1TO8:B$=MID
$(H$,C,1):IF CHR$(B$)<32 OR CHR$
(B$)>1THEN B$="."
22 A$=A$+B$:T$=HEX$(B$):IF LEN(T
$)=1THENT$="0"+T$
23 H$=H$+T$:NEXT C:PRINTI$;": ";
H$;" ";A$:I=I+8:Y=Y+1:IF Y=15 TH
EN INPUT T$:Y=1
24 NEXT B:NEXT S:CLOSE#1:END

Line 20 here covers lines 20 through 103 of yesterday's code, and I only broke it there because of the IF statement, which ends at the end of a number statement. Otherwise, it could have been longer, up to 255 characters in length. All due to memory constraints—4,096 bytes, 16,384 bytes or 32,768 bytes of RAM to fit both the program and data (and if you want high resolution graphics, you give up 6,144 bytes; 12,288 bytes if you want double-buffered high resolution graphics—and by “high resolution graphics” I mean 256×192 pixels, two colors).

Yes kids, this is how we used to write programs. And part of the reason why BASIC has the terrible reputation that it does.

Monday, January 23, 2012

A disconnect

R stops by my desk and drops off a new toy unit to play with test. It's a network device you can plug a POTS line into and make calls over the Internet. I guess we're testing the toy to play with unit to see if our phone network features work with it.

It's a nice looking device and as R hands it to me, I see that's it still on (yes, it comes with both an internal battery and a wall-wart). The tests aren't complicated, but I do need to read the manual to figure out how to run a few of them (involving conference calling, and forwarding phone calls elsewhere). R also hands me the box the toy to play with unit came in.

As I searched through the box for the manual, I come across a USB cable, still in it plastic wrap. Okay, the unit comes with a USB port; what doesn't these days? I then find the manual and start flipping through it. There's the diagram of the toy to play with unit with a description of each port and button on it. I notice the USB port has a note:

NOTE: Never place a USB-based device into the USB port of the XXXX XXXXX XXXXXXX under any circumstances. Doing so may damage the device and negate its warranty. The port was designed for diagnostic purposes only; it is not intended for customer use.

So, not only is there no sticker over the USB port saying “removal of this sticket voids warranty” but they give you a USB cable not to plug into it!

Methinks there is a disconnect between manufactoring and packaging at the factory that makes these toys to play with units.


99 ways to program a hex, Part 15: Lua

I'm still taking a break from C, and today's version is in Lua.

#!/usr/bin/env lua
-- ***************************************************************
--
-- Copyright 2012 by Sean Conner.  All Rights Reserved.
-- 
-- This program is free software: you can redistribute it and/or modify
-- it under the terms of the GNU General Public License as published by
-- the Free Software Foundation, either version 3 of the License, or
-- (at your option) any later version.
--
-- This program is distributed in the hope that it will be useful,
-- but WITHOUT ANY WARRANTY; without even the implied warranty of
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-- GNU General Public License for more details.
--
-- You should have received a copy of the GNU General Public License
-- along with this program.  If not, see <http://www.gnu.org/licenses/>.
--
-- Comments, questions and criticisms can be sent to: sean@conman.org
--
-- ********************************************************************

-- Style: Lua 5.1

function do_dump(fpin,fpout)
  local offset = 0

  while true do
    local line = fpin:read(16)
    if line == nil then return end
      fpout:write(
    	string.format("%08X: ",offset),
        line:gsub(".",function(c) return string.format("%02X ",c:byte()) end),
    	string.rep(" ",3 * (16 - line:len())),
    	line:gsub("%c","."),
    	"\n"
      )
    offset = offset + 16
  end
end

-- **********************************************************************

if #arg == 0 then
  print("-----stdin-----")
  do_dump(io.stdin,io.stdout)
else
  for i = 1 , #arg do
    local f = io.open(arg[1],"r")
    io.stdout:write("-----",arg[1],"-----","\n")
    do_dump(f,io.stdout)
    f:close()
  end
end

os.exit(0)

What I'm noticing (besides my text editor's horrible attempts at syntax highlighting in this entry) is that the non-C versions are quite a bit shorter than the C versions. I'm sure part of that reason is the high level of abstraction obtained by not using C. For instance, in this version, the code to dump the data is easily half the length of the shortest C version, thanks to the clever string.gsub() routine in Lua.

Tuesday, January 24, 2012

99 ways to program a hex, Part 16: Lua, recursion

I'm continuing with Lua, with today's version using recursion.

#!/usr/bin/env lua
-- ***************************************************************
--
-- Copyright 2012 by Sean Conner.  All Rights Reserved.
--
-- This program is free software: you can redistribute it and/or modify
-- it under the terms of the GNU General Public License as published by
-- the Free Software Foundation, either version 3 of the License, or
-- (at your option) any later version.
--
-- This program is distributed in the hope that it will be useful,
-- but WITHOUT ANY WARRANTY; without even the implied warranty of
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-- GNU General Public License for more details.
--
-- You should have received a copy of the GNU General Public License
-- along with this program.  If not, see <http://www.gnu.org/licenses/>.
--
-- Comments, questions and criticisms can be sent to: sean@conman.org
--
-- ********************************************************************

-- Style: Lua 5.1, recursion

function do_dump(fpin,fpout,offset)
  local line = fpin:read(16)
  if line == nil then return end
  fpout:write(
  	string.format("%08X: ",offset),
  	line:gsub(".",function(c) return string.format("%02X ",c:byte()) end),
  	string.rep(" ",3 * (16 - line:len())),
  	line:gsub("%c","."),
  	"\n"
  )
  return do_dump(fpin,fpout,offset + 16)
end

-- **********************************************************************

if #arg == 0 then
  print("-----stdin-----")
  do_dump(io.stdin,io.stdout,0)
else
  for i = 1 , #arg do
    local f = io.open(arg[1],"r")
    io.stdout:write("-----",arg[1],"-----","\n")
    do_dump(f,io.stdout,0)
    f:close()
  end
end

os.exit(0)

Here, we have the do_dump() function calling itself for each lines worth of data. If you don't have experience with recursion, this is a common technique of solving certain programming problems by having a function call itself with either a simpler case to solve, or, like in this example, by calling itself with more data. And it just works.

If you are familiar with recursion, you might be horrified at such a solution, since a very large file might cause the program to crash since with recursion, the program (behind the scenes) keeps track of everything it's already done and thus, could run out of memory.

But in this case, we don't have to worry. Lua takes advantage of what's called “tail call optmization.” In this case, you can think of the tail call as a form of goto, but this type of goto can also goto other functions, which is useful in implementing state machines. For example, a pseudocode version of the TFTP protocol, in Lua:

function server(conn)
  remote,request = conn:read()

  if request.opcode == 'read' then
    info = open_read(request.file)
    READ_DATA(remote,info)
  elseif request.opcode == 'write' then
    info = open_write(request.file)
    SEND_ACK(remote,info)
  end

  return server(conn)
end

-- *******************************************************

function READ_DATA(remote,info)
  remote:send(DATA,readblock(info.file,info.blocknum))
  return RECEIVE_ACK(remote,info)
end

-- *******************************************************

function RECEIVE_ACK(remote,info)
  ack = remote:read_ack()

  if info.blocknum > 0 and ack.blocknum < info.blocknum then
    return RECEIVE_ACK(remote,info)
  end

  if #info.data < 512 then
    return --we're done
  end

  info.blocknum = info.blocknum + 1
  return READ_DATA(remote,info)
end

-- *******************************************************

function SEND_ACK(remote,info)
  remote:send(ACK,info.blocknum)
  if info.blocknum > 0 and #info.data < 512 then
    return  -- we're done
  else
    return RECEIVE_DATA(remote,info)
  end
end

-- *******************************************************

function RECEIVE_DATA(remote,info)
  data = remote:read_data()

  if data.blocknum < info.blocknum then
    return SEND_ACK(remote,info)
  end

  info.blocknum = data.blocknum
  writeblock(info.file,info.blocknum,data.data)
  return SEND_ACK(remote,info)
end

I earlier said I wasn't a fan of tail call optimization but then, I didn't see a use for it. Here I do, at least for state machines. But for a hex dump program, not so much, but it doesn't hurt all that much either—it's still a loop in this case.

Wednesday, January 25, 2012

99 ways to program a hex, Part 17: Lua, recursion, runtime type checking

Since Lua is a dynamically typed language (“values have types, not variables”) we can check the type of a variable at runtime and behave accordingly. Before, we were restricted with just dumping a file, but we could also dump strings (which in Lua can be pure binary data). So today's version checks what type the input is; if it's a file, we read data from there, otherwise if the input is a string, we pull the next blob of data out of it.

Granted, we don't actually use that feature here, but we can more easily reuse do_dump() elsewhere.

#!/usr/bin/env lua
-- ***************************************************************
--
-- Copyright 2010 by Sean Conner.  All Rights Reserved.
--
-- This program is free software: you can redistribute it and/or modify
-- it under the terms of the GNU General Public License as published by
-- the Free Software Foundation, either version 3 of the License, or
-- (at your option) any later version.
--
-- This program is distributed in the hope that it will be useful,
-- but WITHOUT ANY WARRANTY; without even the implied warranty of
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-- GNU General Public License for more details.
--
-- You should have received a copy of the GNU General Public License
-- along with this program.  If not, see <http://www.gnu.org/licenses/>.
--
-- Comments, questions and criticisms can be sent to: sean@conman.org
--
-- ********************************************************************

-- Style: Lua 5.1, recursion, runtime type checking

function do_dump(fpin,fpout,offset)
  local line
  
  if type(fpin) == 'string' then
    if offset > string.len(fpin) then return end
    line = fpin:sub(offset + 1,offset + 16)
  else
    line = fpin:read(16)
    if line == nil then return end
  end

  fpout:write(
  	string.format("%08X: ",offset),
  	line:gsub(".",function(c) return string.format("%02X ",c:byte()) end),
  	string.rep(" ",3 * (16 - line:len())),
  	line:gsub("%c","."),
  	"\n"
  )
  return do_dump(fpin,fpout,offset + 16)
end

-- **********************************************************************

if #arg == 0 then
  print("-----stdin-----")
  do_dump(io.stdin,io.stdout,0)
else
  for i = 1 , #arg do
    local f = io.open(arg[1],"r")
    io.stdout:write("-----",arg[1],"-----","\n")
    do_dump(f,io.stdout,0)
    f:close()
  end
end

os.exit(0)

“A big ol' slab of beef!”

Once more into the breach, but I remembered last time this happened, and acted accordingly. But I needn't worry—while we were swarmed with men armed with huge chunks of roast critter, this time, the restaurant was way more crowded and thus, we weren't swarmed quite as heavily.

Also, amusingly, on the far wall from where I was sitting was a large wide screen television showing closeups of grass, of all things. And it wasn't made up like a window either—the grasses would change every so often. And yes, it was video of grass, not static images of grass.

Very odd.

And the food was again, great.

And yes, expense accounts rock!

Thursday, January 26, 2012

99 ways to program a hex, Part 18: Lua, recursion, callback

Yesterday's version checked the input to see if it was a file or a string and acted accordingly. That's fine, but perhaps a better way is to include a callback function and some opaque piece of datum for that callback to work on. That way, we can operate on more than just strings or files. It's open ended on what we can support.

#!/usr/bin/env lua
-- ***************************************************************
--
-- Copyright 2010 by Sean Conner.  All Rights Reserved.
-- 
-- This program is free software: you can redistribute it and/or modify
-- it under the terms of the GNU General Public License as published by
-- the Free Software Foundation, either version 3 of the License, or
-- (at your option) any later version.
--
-- This program is distributed in the hope that it will be useful,
-- but WITHOUT ANY WARRANTY; without even the implied warranty of
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-- GNU General Public License for more details.
--
-- You should have received a copy of the GNU General Public License
-- along with this program.  If not, see <http://www.gnu.org/licenses/>.
--
-- Comments, questions and criticisms can be sent to: sean@conman.org
--
-- ********************************************************************

-- Style: Lua 5.1, recursion, callback

function do_dump(fpout,offset,callback,data)
  local line = callback(data,offset)
  if line == nil then return end
  fpout:write(
  	string.format("%08X: ",offset),
  	line:gsub(".",function(c) return string.format("%02X ",c:byte()) end),
  	string.rep(" ",3 * (16 - line:len())),
  	line:gsub("%c","."),
  	"\n"
  )
  return do_dump(fpout,offset + 16,callback,data)
end

-- **********************************************************************

local function cb(data,offset)
  return data:read(16)
end

if #arg == 0 then
  print("-----stdin-----")
  do_dump(io.stdout,0,cb,io.stdin)
else
  for i = 1 , #arg do
    local f = io.open(arg[1],"r")
    io.stdout:write("-----",arg[1],"-----","\n")
    do_dump(io.stdout,0,cb,f)
    f:close()
  end
end

os.exit(0)

Friday, January 27, 2012

99 ways to program a hex, Part 19: Lua, recursion, closure as callback

Now, instead of passing along data just to be passed to the callback function, we can include such data as part of a closure to the function we pass to the do_dump() function.

#!/usr/bin/env lua
-- ***************************************************************
--
-- Copyright 2010 by Sean Conner.  All Rights Reserved.
--
-- This program is free software: you can redistribute it and/or modify
-- it under the terms of the GNU General Public License as published by
-- the Free Software Foundation, either version 3 of the License, or
-- (at your option) any later version.
--
-- This program is distributed in the hope that it will be useful,
-- but WITHOUT ANY WARRANTY; without even the implied warranty of
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-- GNU General Public License for more details.
--
-- You should have received a copy of the GNU General Public License
-- along with this program.  If not, see <http://www.gnu.org/licenses/>.
--
-- Comments, questions and criticisms can be sent to: sean@conman.org
--
-- ********************************************************************

-- Style: Lua 5.1, recursion, closure as callback

function do_dump(fpout,offset,callback)
  local line = callback(offset)
  if line == nil then return end
  fpout:write(
  	string.format("%08X: ",offset),
  	line:gsub(".",function(c) return string.format("%02X ",c:byte()) end),
  	string.rep(" ",3 * (16 - line:len())),
  	line:gsub("%c","."),
  	"\n"
  )
  return do_dump(fpout,offset + 16,callback)
end

-- **********************************************************************

if #arg == 0 then
  print("-----stdin-----")
  do_dump(io.stdout,0,cb,io.stdin)
else
  for i = 1 , #arg do
    local f = io.open(arg[1],"r")
    io.stdout:write("-----",arg[1],"-----","\n")
    do_dump(io.stdout,0,function(offset) return f:read(16) end)
    f:close()
  end
end

os.exit(0)

Here, our function (which is not named, as you don't really need to name functions in Lua) references our open file f, but in order to do so, Lua needs to include a reference to f to the function when said function is passed to do_dump(). It does so by creating what's called a “closure”—think of a closure as both a pointer (or reference) to a function, plus a pointer (or reference) to data that is outside the normal lexical scope of the function.

And why do I pass in the offset when my unnamed (“anonymous”) function doesn't use it? Because it might be useful in some contexts to know where to pull the data (say from a block of memory).

Saturday, January 28, 2012

99 ways to program a hex, Part 20: C89, const correctness, assertive, system calls

When last we left the C versions, we pretty much hit the limit of what we could do using the standard C library to remain portable (well, we did use a GCC extenstion). Not much else we can do, unless we want to leave the Land of Portability™ and start hitting some system specific calls.

So, that's what this version does—it eschews the use of the standard C library (except for exit(), errno and memset()—while I could replace this with my own version, C compilers can and will produce better optimized versions than I can write) and goes straight for the system calls.

This means I will have to write my own code to convert binary to hexidecimal, but I've written such code plenty of times before.

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: C89, const correctness, assertive, system calls */

#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <assert.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

#define LINESIZE	16

/********************************************************************/

extern const char *sys_errlist[];
extern int         sys_nerr;

static void	do_dump		(const int,const int);
static size_t	dump_line	(const int,unsigned char *,size_t,const unsigned long);
static void	hexout		(char *,unsigned long,size_t,const int);
static void	myperror	(const char *const);
static size_t	myread		(const int,char *,size_t);
static void	mywrite		(const int,const char *const,const size_t);

/********************************************************************/

int main(const int argc,const char *const argv[])
{
  if (argc == 1)
    do_dump(STDIN_FILENO,STDOUT_FILENO);
  else
  {
    int i;
    
    for (i = 1 ; i < argc ; i++)
    {
      int fhin;
      
      fhin = open(argv[i],O_RDONLY);
      if (fhin == -1)
      {
        myperror(argv[i]);
        continue;
      }
      
      mywrite(STDOUT_FILENO,"-----",5);
      mywrite(STDOUT_FILENO,argv[i],strlen(argv[i]));
      mywrite(STDOUT_FILENO,"-----\n",6);
      
      do_dump(fhin,STDOUT_FILENO);
      if (close(fhin) < 0)
        myperror(argv[i]);
    }
  }
  
  return 0;
}
      
/************************************************************************/     

static void do_dump(const int fhin,const int fhout)
{
  unsigned char buffer[4096];
  unsigned long off;
  size_t        bytes;
  
  assert(fhin  >= 0);
  assert(fhout >= 0);

  off = 0;
  
  while((bytes = myread(fhin,(char *)buffer,sizeof(buffer))) > 0)
  {
    unsigned char *p = buffer;
    
    for (p = buffer ; bytes > 0 ; )
    {
      size_t amount;
      
      amount = dump_line(fhout,p,bytes,off);
      p     += amount;
      bytes -= amount;
      off   += amount;
    }
  }
}

/********************************************************************/

static size_t dump_line(
	const int            fhout,
	unsigned char       *p,
	size_t               bytes,
	const unsigned long  off
)
{
  size_t count;
  char   addr [9];
  char   hex  [LINESIZE * 3];
  char   ascii[LINESIZE];
  char  *dh;
  char  *da;
  
  assert(fhout >= 0);
  assert(p     != NULL);
  assert(bytes >  0);
  
  memset(hex   ,' ',sizeof(hex));
  memset(ascii,' ',sizeof(hex));

  hexout(addr,off,8,':');
  if (bytes > LINESIZE)
    bytes = LINESIZE;
    
  p  += bytes;
  dh  = &hex[bytes * 3];
  da  = &ascii[bytes];
  
  assert(addr[8] == ':');
  assert(bytes <= LINESIZE);
  assert(dh == &hex  [bytes * 3]);
  assert(da == &ascii[bytes]);
  
  for (count = 0 ; (count < bytes) && (count < LINESIZE) ; count++)
  {
    p  --;
    da --;
    dh -= 3;
    
    if ((*p >= ' ') && (*p <= '~'))
      *da = *p;
    else
      *da = '.';
    
    hexout(dh,(unsigned long)*p,2,' ');
  }
  
  assert(dh == hex);
  assert(da == ascii);
  
  mywrite(fhout,addr,sizeof(addr));
  mywrite(fhout," ",1);
  mywrite(fhout,hex,sizeof(hex));
  mywrite(fhout,ascii,count);
  mywrite(fhout,"\n",1);

  return count;
}

/**********************************************************************/  

static void hexout(char *dest,unsigned long value,size_t size,const int padding)
{
  assert(dest != NULL);
  assert(size >  0);
  assert((padding >= ' ') && (padding <= '~'));
  
  dest[size] = padding;
  while(size--)
  {
    dest[size] = (char)((value & 0x0F) + '0');
    if (dest[size] > '9') dest[size] += 7;
    value >>= 4;
  }
}

/************************************************************************/

static void myperror(const char *const s)
{
  int err = errno;
  
  assert(s != NULL);
  
  mywrite(STDERR_FILENO,s,strlen(s));
  mywrite(STDERR_FILENO,": ",2);
  
  if (err > sys_nerr)
    mywrite(STDERR_FILENO,"(unknown)",9);
  else
    mywrite(STDERR_FILENO,sys_errlist[err],strlen(sys_errlist[err]));
  mywrite(STDERR_FILENO,"\n",1);
}

/************************************************************************/

static size_t myread(const int fh,char *buf,size_t size)
{
  size_t amount = 0;
  
  assert(fh   >= 0);
  assert(buf  != NULL);
  assert(size >  0);
  
  while(size > 0)
  {
    ssize_t bytes;
    
    bytes = read(fh,buf,size);
    if (bytes < 0)
    {
      myperror("read()");
      exit(EXIT_FAILURE);
    }
    if (bytes == 0)
      break;
    
    amount += bytes;
    size   -= bytes;
    buf    += bytes;
  }
  
  return amount;
}

/*********************************************************************/  
  
static void mywrite(const int fh,const char *const msg,const size_t size)
{
  assert(fh   >= 0);
  assert(msg  != NULL);
  assert(size >  0);
  
  if (write(fh,msg,size) < (ssize_t)size)
  {
    if (fh != STDERR_FILENO)
      myperror("output");
      
    exit(EXIT_FAILURE);
  }
}

/***********************************************************************/

The major trick here is that I generate the output for each line backwards! I do that because it's easier to generate the hexidecimal output that way. Generating the hexidecimal output “forwards” would mean I need to rotate the first four bits down into position (so with a 32-bit value, I would need to shift the bits down 28 positions), then generate the hex digit, then rotate the next four bits down 24 positions, but by then, I'm doing repeated rotates and discarding all the work I did previously for each digit. And if I only want to work with 8 bits, I have to have another special function do handle that, or complicate one function to handle multiple number of bits.

But by going backwards, I start with the last four bits, which are already in the “proper position” to generate a digit, then shift everthing down four bits, and keep repeating this until the specified number of hexidecimal digits are produced.

So, while the amount of code goes up, it is faster than the more portable version:

[spc]lucy:~/projects/99/src>time ./12 ~/bin/firefox/libxul.so >/dev/null

real    0m4.985s
user    0m4.969s
sys     0m0.015s
[spc]lucy:~/projects/99/src>time ./20 ~/bin/firefox/libxul.so >/dev/null

real    0m2.936s
user    0m1.511s
sys     0m1.425s

It's almost twice as fast, yet it spends a disturbingly large amount of time (compared to the portable version) in the kernel. It's because of all the calls to write() I do. That's a problem I'll attack in the next version.

Sunday, January 29, 2012

99 ways to program a hex, Part 21: C89, const correctness, assertive, system calls, per line buffering

Yesterday's version was faster than the portable version, but spent nearly 100 times longer in the kernel than the portable version, and that's because the portable version, using the standard C library, buffers the output way more than my non-portable system calling version did. Making a subroutine call into the kernel (a “system call”) takes way more time than just calling a regular subroutine call.

So, we need to avoid making a ton of system calls, and to do that, we need to buffer the output a bit more. This version, we buffer an entire line's worth of data before writing it out.

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: C89, const correctness, assertive, system calls, per line buffering */

#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <assert.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

#define LINESIZE	16

/********************************************************************/

extern const char *sys_errlist[];
extern int         sys_nerr;

static void	do_dump		(const int,const int);
static size_t	dump_line	(const int,unsigned char *,size_t,const unsigned long);
static void	hexout		(char *,unsigned long,size_t,const int);
static void	myperror	(const char *const);
static size_t	myread		(const int,char *,size_t);
static void	mywrite		(const int,const char *const,const size_t);

/********************************************************************/

int main(const int argc,const char *const argv[])
{
  if (argc == 1)
    do_dump(STDIN_FILENO,STDOUT_FILENO);
  else
  {
    int i;
    
    for (i = 1 ; i < argc ; i++)
    {
      int fhin;
      
      fhin = open(argv[i],O_RDONLY);
      if (fhin == -1)
      {
        myperror(argv[i]);
        continue;
      }
      
      mywrite(STDOUT_FILENO,"-----",5);
      mywrite(STDOUT_FILENO,argv[i],strlen(argv[i]));
      mywrite(STDOUT_FILENO,"-----\n",6);
      
      do_dump(fhin,STDOUT_FILENO);
      if (close(fhin) < 0)
        myperror(argv[i]);
    }
  }
  
  return 0;
}
      
/************************************************************************/     

static void do_dump(const int fhin,const int fhout)
{
  unsigned char buffer[4096];
  unsigned long off;
  size_t        bytes;
  
  assert(fhin  >= 0);
  assert(fhout >= 0);

  off = 0;
  
  while((bytes = myread(fhin,(char *)buffer,sizeof(buffer))) > 0)
  {
    unsigned char *p = buffer;
    
    for (p = buffer ; bytes > 0 ; )
    {
      size_t amount;
      
      amount = dump_line(fhout,p,bytes,off);
      p     += amount;
      bytes -= amount;
      off   += amount;
    }
  }
}

/********************************************************************/

static size_t dump_line(
	const int            fhout,
	unsigned char       *p,
	size_t               bytes,
	const unsigned long  off
)
{
  char    line[75];
  char   *dh;
  char   *da;
  size_t  count;
  
  assert(fhout >= 0);
  assert(p     != NULL);
  assert(bytes >  0);
  
  memset(line,' ',sizeof(line));
  hexout(line,off,8,':');
  if (bytes > LINESIZE)
    bytes = LINESIZE;
  
  p  += bytes;
  dh  = &line[10 + bytes * 3];
  da  = &line[58 + bytes];
  
  for (count = 0 ; count < bytes ; count++)
  {
    p--;
    da--;
    dh -= 3;
    
    if ((*p >= ' ') && (*p <= '~'))
      *da = *p;
    else
      *da = '.';
    
    hexout(dh,(unsigned long)*p,2,' ');
  }
  
  line[58 + count] = '\n';
  mywrite(fhout,line,59 + count);
  return count;
}

/**********************************************************************/  

static void hexout(char *dest,unsigned long value,size_t size,const int padding)
{
  assert(dest != NULL);
  assert(size >  0);
  assert((padding >= ' ') && (padding <= '~'));
  
  dest[size] = padding;
  while(size--)
  {
    dest[size] = (char)((value & 0x0F) + '0');
    if (dest[size] > '9') dest[size] += 7;
    value >>= 4;
  }
}

/************************************************************************/

static void myperror(const char *const s)
{
  int err = errno;
  
  assert(s != NULL);
  
  mywrite(STDERR_FILENO,s,strlen(s));
  mywrite(STDERR_FILENO,": ",2);
  
  if (err > sys_nerr)
    mywrite(STDERR_FILENO,"(unknown)",9);
  else
    mywrite(STDERR_FILENO,sys_errlist[err],strlen(sys_errlist[err]));
  mywrite(STDERR_FILENO,"\n",1);
}

/************************************************************************/

static size_t myread(const int fh,char *buf,size_t size)
{
  size_t amount = 0;
  
  assert(fh   >= 0);
  assert(buf  != NULL);
  assert(size >  0);
  
  while(size > 0)
  {
    ssize_t bytes;
    
    bytes = read(fh,buf,size);
    if (bytes < 0)
    {
      myperror("read()");
      exit(EXIT_FAILURE);
    }
    if (bytes == 0)
      break;
    
    amount += bytes;
    size   -= bytes;
    buf    += bytes;
  }
  
  return amount;
}

/*********************************************************************/  
  
static void mywrite(const int fh,const char *const msg,const size_t size)
{
  assert(fh   >= 0);
  assert(msg  != NULL);
  assert(size >  0);
  
  if (write(fh,msg,size) < (ssize_t)size)
  {
    if (fh != STDERR_FILENO)
      myperror("output");
      
    exit(EXIT_FAILURE);
  }
}

/***********************************************************************/

Okay, so how does this version fare?

[spc]lucy:~/projects/99/src>time ./20 ~/bin/firefox/libxul.so >/dev/null

real    0m2.941s
user    0m1.499s
sys     0m1.441s
[spc]lucy:~/projects/99/src>time ./21 ~/bin/firefox/libxul.so >/dev/null

real    0m0.957s
user    0m0.645s
sys     0m0.313s

Not bad—one third the time overall, and one fifth the amount of time spent in the kernel. And compared to the portable version, this only takes one fifth the total time, although it's still spending over twenty times as long in kernel space.

We can do better—it just takes more buffering and less system calls.

Monday, January 30, 2012

99 ways to program a hex, Part 22: C89, const correctness, assertive, system calls, full buffering

So yesterday I presented a non-portable version that was quite a bit faster than the portable version, but I'm not quite done yet. That version just buffered a line at a time—today's version buffers nearly 8k worth of data (it's not exact, but it's close enough) between calls to write().

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: C89, const correctness, assertive, system calls, full buffering */

#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <assert.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

#define LINESIZE	16

/********************************************************************/

extern const char *sys_errlist[];
extern int         sys_nerr;

static void	do_dump		(const int,const int);
static size_t	dump_line	(char **const,unsigned char *,size_t,const unsigned long);
static void	hexout		(char *const,unsigned long,size_t,const int);
static void	myperror	(const char *const);
static size_t	myread		(const int,char *,size_t);
static void	mywrite		(const int,const char *const,const size_t);

/********************************************************************/

int main(const int argc,const char *const argv[])
{
  if (argc == 1)
    do_dump(STDIN_FILENO,STDOUT_FILENO);
  else
  {
    int i;
    
    for (i = 1 ; i < argc ; i++)
    {
      int fhin;
      
      fhin = open(argv[i],O_RDONLY);
      if (fhin == -1)
      {
        myperror(argv[i]);
        continue;
      }
      
      mywrite(STDOUT_FILENO,"-----",5);
      mywrite(STDOUT_FILENO,argv[i],strlen(argv[i]));
      mywrite(STDOUT_FILENO,"-----\n",6);
      
      do_dump(fhin,STDOUT_FILENO);
      if (close(fhin) < 0)
        myperror(argv[i]);
    }
  }
  
  return EXIT_SUCCESS;
}
      
/************************************************************************/     

static void do_dump(const int fhin,const int fhout)
{
  unsigned char  buffer[4096];
  char           outbuffer[75 * 109];
  char          *pout;
  unsigned long  off;
  size_t         bytes;
  size_t         count;
  
  assert(fhin  >= 0);
  assert(fhout >= 0);

  memset(outbuffer,' ',sizeof(outbuffer));
  off      = 0;
  count    = 0;
  pout     = outbuffer;
  
  while((bytes = myread(fhin,(char *)buffer,sizeof(buffer))) > 0)
  {
    unsigned char *p = buffer;
    
    for (p = buffer ; bytes > 0 ; )
    {
      size_t amount;
      
      amount    = dump_line(&pout,p,bytes,off);
      p        += amount;
      bytes    -= amount;
      off      += amount;
      count++;
      
      if (count == 109)
      {
        mywrite(fhout,outbuffer,(size_t)(pout - outbuffer));
        memset(outbuffer,' ',sizeof(outbuffer));
        count    = 0;
        pout     = outbuffer;
      }      
    }
  }
  
  if ((size_t)(pout - outbuffer) > 0)
    mywrite(fhout,outbuffer,(size_t)(pout - outbuffer));
}

/********************************************************************/

static size_t dump_line(
	char                **const pline,
	unsigned char              *p,
	size_t                      bytes,
	const unsigned long         off
)
{
  char   *line;
  char   *dh;
  char   *da;
  size_t  count;
  
  assert(pline  != NULL);
  assert(*pline != NULL);
  assert(p      != NULL);
  assert(bytes  >  0);
  
  line = *pline;
  
  hexout(line,off,8,':');
  if (bytes > LINESIZE)
    bytes = LINESIZE;
  
  p  += bytes;
  dh  = &line[10 + bytes * 3];
  da  = &line[58 + bytes];
  
  for (count = 0 ; count < bytes ; count++)
  {
    p  --;
    da --;
    dh -= 3;
    
    if ((*p >= ' ') && (*p <= '~'))
      *da = *p;
    else
      *da = '.';
    
    hexout(dh,(unsigned long)*p,2,' ');
  }
  
  line[58 + count] = '\n';
  *pline = &line[59 + count];
  return count;
}

/**********************************************************************/  

static void hexout(char *const dest,unsigned long value,size_t size,const int padding)
{
  assert(dest != NULL);
  assert(size >  0);
  assert((padding >= ' ') && (padding <= '~'));
  
  dest[size] = padding;
  while(size--)
  {
    dest[size] = (char)((value & 0x0F) + '0');
    if (dest[size] > '9') dest[size] += 7;
    value >>= 4;
  }
}

/************************************************************************/

static void myperror(const char *const s)
{
  int err = errno;
  
  assert(s != NULL);
  
  mywrite(STDERR_FILENO,s,strlen(s));
  mywrite(STDERR_FILENO,": ",2);
  
  if (err > sys_nerr)
    mywrite(STDERR_FILENO,"(unknown)",9);
  else
    mywrite(STDERR_FILENO,sys_errlist[err],strlen(sys_errlist[err]));
  mywrite(STDERR_FILENO,"\n",1);
}

/************************************************************************/

static size_t myread(const int fh,char *buf,size_t size)
{
  size_t amount = 0;
  
  assert(fh   >= 0);
  assert(buf  != NULL);
  assert(size >  0);
  
  while(size > 0)
  {
    ssize_t bytes;
    
    bytes = read(fh,buf,size);
    if (bytes < 0)
    {
      myperror("read()");
      exit(EXIT_FAILURE);
    }
    if (bytes == 0)
      break;
    
    amount += bytes;
    size   -= bytes;
    buf    += bytes;
  }
  
  return amount;
}

/*********************************************************************/  
  
static void mywrite(const int fh,const char *const msg,const size_t size)
{
  assert(fh   >= 0);
  assert(msg  != NULL);
  assert(size >  0);
  
  if (write(fh,msg,size) < (ssize_t)size)
  {
    if (fh != STDERR_FILENO)
      myperror("output");
      
    exit(EXIT_FAILURE);
  }
}

/***********************************************************************/

And that makes all the difference. The portable vesrsion:

[spc]lucy:~/projects/99/src>time ./12 ~/bin/firefox/libxul.so >/dev/null

real    0m4.985s
user    0m4.969s
sys     0m0.015s

Our first stab at a non-portable, but possibly faster version:

[spc]lucy:~/projects/99/src>time ./20 ~/bin/firefox/libxul.so >/dev/null

real    0m2.936s
user    0m1.511s
sys     0m1.425s

The “it's quite a bit faster” version:

[spc]lucy:~/projects/99/src>time ./21 ~/bin/firefox/libxul.so >/dev/null

real    0m0.957s
user    0m0.645s
sys     0m0.313s

And finally, the punchline—today's version:

[spc]lucy:~/projects/99/src>time ./22 ~/bin/firefox/libxul.so >/dev/null

real    0m0.460s
user    0m0.448s
sys     0m0.012s

And yes, that's the real output—1/10 the time of the portable version with a similar amount of time in the kernel.

Frankly, I was a bit surprised at these results—not that the non-portable version was faster (that's almost a given) but the magnitude of the results. I didn't think the standard C library had that much overhead. I was expecting easily a percentage increase in speed, but even twice would have been unexpected, but ten times faster?

Wow.

Increasing the size of the buffer past what I have probably won't help all that much, and in fact, when I doubled the buffer size:

[spc]lucy:~/projects/99/src>time ./22 ~/bin/firefox/libxul.so >/dev/null

real    0m0.592s
user    0m0.582s
sys     0m0.010s

The timing difference could be due to cache effects, maybe?

So I think we've maxed out the speed at which this program will run. As a test, I profiled the code to see if there was anything I migh have missed:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
100.21      0.41     0.41        1   410.86   410.86  do_dump
  0.00      0.41     0.00     9197     0.00     0.00  mywrite

I checked the code GCC produced (all code was compiled with -O3, a very high level of optimization) and well, I'm not sure I could have done much better, and probably would have done worse—GCC inlined everything into do_dump() (with the exception of main() and mywrite()), something I would not have done in assembly (and have any hope of code reuse for another project). So I think we're done with making this code fast.

That's not to say I won't do an assembly version of this program, but it probably won't be for the x86 line.


Double facepalm

R, who runs the Ft. Lauderdale Office of The Corporation, stopped by and informed me that we now have a source license to The Protocol Stack From Hell™.

WE HAVE A SOURCE LICENSE TO THE PROTOCOL STACK FROM HELL™!

It's about time.

Now I can figure out why I needed that mutex around a few calls.

At this point—I don't know what's worse, not knowing what's in the code, or knowing what's in the code.

Okay, curiosity got the better of me, and we all know what happens to curious cats. I could only be so lucky.

Let's see … K&R style code, standard vowel impairment, no comments—that I was expecting.

C89 style declarations a nice plus—I guess they needed to upate the code at some point in the past, but global variables all over the place? For a library?

[Facepalm]

No wonder I needed those mutexes all over the place.

And more unbelivably, the first file I took a look at wouldn't compile at all because the braces were mis-aligned! Further investigation of the code revealed twelve (12) such files!

[Double facepalm: for when one facepalm doesn't cut it]

I don't even want to know how much was spent on this.

And no, this isn't some off version. No, this is the version we're using!


Sounds of a programmer dominated office

Yeah, this sounds about right for an office dominated by programmers (link via Reddit)

Tuesday, January 31, 2012

Rabid howler monkeys on crack wrote this code

Okay, yes, there are issues with the code to The Protocol Stack From Hell™. There's the vowel impaired names—oh, sorry, the vwl_imprd_nms, the usual ignoring return codes and tons of global variables—sorry, glblvrbls littering the code. And there's stuff like:

foo_t *p = NULL;

/* lots of code not touching p at all */

if (p) {
	/* lots of code that will never be executed because */ 
	/* p is always, *always* NULL at this point         */
} else {
	/* this code will always *always* be executed */
	/* p is never touched otherwise */
}

/* p is still never used */

Yes. At one point p was probaby used, then a code change sometime during the Clinton Administration (late first term most likely) removed the need for p but later code still checked it, so in order to keep the code from crashing (during the last year of the Clinton Administration, most likely) the “easiest fix that would work with minimal code changes because we want to avoid a five day regression test” is to just NULL out the variable where declared and call it a day.

Odder yet is the code that generates a string, checks to see if the generated string ends with two newline characters and then adds one or two newline characters if required (and yes, it checks for the first newline character, then the second) and further down in the code, it checks to see if the line has two newline characters and carefully removes them, one at a time.

Yes. The code adds two characters, only to remove them later on.

Again, I can see the requirements late during the Reagan Administration bumping up against the requirements during the early Bush 43 Adminstration and again, the easiest way to handle this is a local change that distrubs as little code as possible.

Although, there is one bit that does smack of rabid howler monkeys on crack taking a pass at the code, which I briefly mention in passing. It's basically the Poster Child™ for why certain C programmers should be taken out back behind the shed and disembowled with a grapefruit spoon.

Back then, I was tasked with modifying some code to log the Protocol Stack From Hell™ errors via syslog(), and all I had to work with was a C source file:

/* tons o' broilerplate text whereby we pledge our first born to feed the
 * lawyers of The Protocol Stack From Hell™ */

void Stpd_Init_Fnctn(void)
{
  MYSTERIOUS_ENTRY_CODE_WE_CANT_TOUCH();
  /* modifications here */
  MYSTERIOUS_EXIT_CODE_WE_CANT_TOUCH();
}

void Stpd_Lrm_Fnctn(lrm_t *ptr,int wat)
{
  MYSTERIOUS_ENTRY_CODE_WE_CANT_TOUCH();
  /* modifications here */
  MYSTERIOUS_EXIT_CODE_WE_CANT_TOUCH();
}

/* Muahahahahahahahahahahahaha! */
/* [S/X thunder ] */

(No, seriously, each function starts and ends with MYSTERIOUS_something_CODE_WE_CANT_TOUCH()) and an object file, which the C code is linked against to produce the final program.

Okay, nothing that out of the ordinary. Only we weren't getting the proper error messages from the lrm_t … thingy … we were given. Some back and forth with The Protocol Stack From Hell™ Technical Support® and we had the final solution, and if you can read C code, prepared to be horrified:

void Stpd_Lrm_Fnctn(lrm_t *prt,int wat)
{
  char *msg;

  MYSTERIOUS_ENTRY_CODE_WE_CANT_TOUCH();

  msg = (char *)prt + sizeof(lrm_t);
  syslog(LOG_WTF,"error: %s",msg);

  MYSTERIOUS_EXIT_CODE_WE_CANT_TOUCH();
}

For those not fluent in C, let me translate: “you will receive a block of memory called prt, which has a particular layout we laughingly call lrm_t. Ignore the data there, but instead, what you actually want lies just past the block of memory you received, into an area that Standard C calls “undefined behavior.” Abandon all hope ye who program here. And have a nice day.”

Then, I was horrified. Now, I get to see the code from “the other side” and “horrified” does not describe my reaction. “Running away, screaming in sheer madness of having peered deep into the Abyss” would be a bit closer, but still misses the mark. It goes something like this.

typedef struct {
	/* data data data */ /* [1] */
} lrm_t;

typedef struct {
	/* a mass of data */
	arbitrary_size_t reserved[6]; /* [2] */
} msg_t;

void rnd_fnct(int wat)
{
  /* don't worry, these are big enough */
  char inbffr[256],tmpbffr[256],msgbffr[256]; /* [3] */
  msg_t *msg;
  lrm_t lrm,*plrm;

  /* a bunch of code to receive an SS7 message and determine that the */
  /* contents need to be logged, as it indicates an error */

  msg = (msg_t *)inbffr;
  frmt_rnd_msg(msgbffr,msg); /* we know how big the resulting buffer is */
  
  /* okay, now for the real horror show */
  memcpy(&lrm,msg->reserved,sizeof(lrm_t)); /* [4] */

  /* No!  Don't go into the basement! */
  memcpy(tmpbffr,&lrm,sizeof(lrm_t));	    /* [5] */

  /* The call originated from inside the house! */
  strcpy(&tmpbffr[sizeof(lrm_t)],msgbffr);  /* [6] */

  /* Aieeeeeeeeeeeeeeeeeeeeeeeee! stabbity stab stab */
  Stpd_Lrm_Fnctn((lrm_t *)tmpbffr,wat);     /* [7] */
}

And now for the play-by-play commentary on this horror show:

  1. This describes the layout of the memory block we're given in Stpd_Lrm_Fnctn(). It's not terribly big as structured memory goes, maybe around 60 or 70 bytes, but it primarily contains useless information as I found out.

  2. It's a slightly larger block of memory, but notice the last field, reserved. A comment in the code says that this area is for “internal use only” and is around 24 bytes in size.

    Keep in mind—this field is only 24 bytes in size. The size of the block of memory we're given in Stpd_Lrm_Fnctn() is around 60 or 70, which is larger than 24. 24 is smaller than 60 and 70. This is important.

  3. “256 bytes should be enough for anyone, right?”

    This, on a system with gigabytes of memory.

  4. This copies data out of the reserved field into a block of memory of type lrm_t. Refer back to note 2. Notice how we want to copy 60 or 70 bytes of information, but the field we're copying from is only 24 bytes. This, my friends, is known as “undefined behavior” in C.

    Only, this is air quotation marks okay air quotation marks because msg_t is technically the air quotation marks header air quotation marks of a larger message and thus, we can air quotation marks safely air quotation marks copy memory past the end of the header.

    My thinking here—the error codes originally fit into the space set aside by reserved but grew over time, found out, but too much code relied on this situation, so they're stuck with it.

    Or, you know, rabid howler monkeys on crack.

  5. I just hope that lrm_t doesn't ever exceed 256 bytes in size.

  6. I'm serious. I'm not making this up. The code actually uses strcpy().

    strcpy() is bad because there is no checking to see if you have overrun the space set aside to receive the copied string.

    The use of this function should cause modern C compilers to bitch mightily and stop compilation right then and there, and send the programmer to jail, do not pass Go, do not collect $200.00. Especially if the programmers are rabid howler monkeys on crack.

    This. Is. An. Ex. Function!

  7. And now we call our function. Hope the error message, plus the lrm_t, didn't exceed 256 bytes.

Rabid howler monkeys on crack.

I'm serious.

There aren't enough facepalms to do this code justice.


99 ways to program a hex, Part 23: C89, const correctness, assertive, system calls, full buffering, lookup table

From
Mark Grosberg <XXXXXXXXXXXXXXXXXXXXX>
To
Sean Conner <sean@conman.org>
Subject
Boston: Well, since you're in the land of non-portability …
Date
Sun, 29 Jan 2012 05:55:00
   
static void hexout(char *dest,unsigned long value,size_t size,const int
padding)
{
  assert(dest != NULL);
  assert(size >  0);
  assert((padding >= ' ') && (padding <= '~'));
  
  dest[size] = padding;
  while(size--)
  {
    dest[size] = (char)((value & 0x0F) + '0');
    if (dest[size] > '9') dest[size] += 7;
    value >>= 4;
  }
}

You're also in the land of ASCII specificness. Couldn't you make that:

dest[size] = "0123456789ABCDEF"[value & 0x0f];

And then not be tied to ASCII? You could also then switch out that array pointer if you wanted to get a mix of uppercase, lower case depending on what you need.

-MYG

I initially reject the idea of doing this. My reasoning? The code itself is already non-portable, being restricted to a Posix-like system. So what's one more non-portable item on the list? The sequence if (dest[size] > '9') dest[size] += 7 is around six (for a lot of architectures that aren't RISC based) to twelve bytes (RISC systems) in size, and now you want to add an additional 16 bytes? [He asks, working from a system with a few gigabytes of RAM —Editor] [Shut up! –Sean]. Also, in my nearly 30 years of working with computers, I've yet to come across a non-ASCII based computer system.

Yes, there are a few. Baudot code perhaps being the oldest and perhaps, the oddest one. Then there are the 6-bit character encoding schemes and Radix-50, which pack multiple 6-bit characters per “word” of storage (where a “word” could be 16, 18, 32, 36, 60 or 66 bits in size) and varied from system to system. And let's not forget EBCDIC, one of about six nearly identical, but maddendly different, encoding schemes developed by IBM. All of these were developed for machines in the 60s, but ASCII won out in the end, being the most widely used and at the core of Unicode.

So I asked on a mailing list of classic computer enthusiasts:

From
Sean Conner <spc@conman.org>
To
Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
Subject
C compilers and non-ASCII systems
Date
Tue, 31 Jan 2012 11:21:02 -0500

A friend recently raised an issue with some code I wrote (a hex dump routine) saying it depended upon ASCII and thus, would break on non-ASCII based systems (and proposed a solution, but that's beside the issue here). I wrote back, saying the code in question was non-portable to begin with (since it depended upon read() and write()—it was targetted at Posix based systems) and besides, I've never encountered a non-ASCII system in the nearly 30 years I've been using computers.

So now I'm wondering—besides Baudot, 6-bit BCD and EBCDIC, is there any other encoding scheme used? And of Baudot, 6-bit BCD and EBCDIC, are there any systems using those encoding schemes AND have a C compiler available?

-spc (Or can I safely assume ASCII and derivatives these days?)

I figure if anyone knew the answer, these people would (many of them not only use computers like the PDP-10, but use them as heaters during the winter months).

The answers were fascinating.

From
"Shoppa, Tim" <XXXXXXXXXXXXXXXXX>
To
Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
Subject
Re: C compilers and non-ASCII systems
Date
Tue, 31 Jan 2012 13:18:55 -0500

IBM has a very handy page on C compatibility with EBCDIC system services:

http://www-03.ibm.com/systems/z/os/zos/features/unix/bpxa1p03.html

From
"Dave" <XXXXXXXXXXXXXXXXXXXX>
To
Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
Subject
RE: C compilers and non-ASCII systems
Date
Tue, 31 Jan 2012 19:33:06 -0000

Please consider other character codes. An EBCDIC port of GCC is alive and well on several of the "legacy" operating systems (MVS, VM and Music) that run on the Hercules IBM 360/370/XA/390/z emulator. And whilst zLinux runs in ASCII (or whatever it uses to get more than 256 points in a code page) many zLinux sites also have the zVM hypervisor, which includes an optional EBCDIC C compiler. Having ported the BREXX interpreter to this environment I was stung by the fact that the original author had made assumptions about character ordering that are not true on an EBCDIC platform.

From
Phil Budne <XXXXXXXXXXXXXXXXX>
To
Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
Subject
Re: C compilers and non-ASCII systems
Date
Tue, 31 Jan 2012 13:00:52 -0500

See “IBM libascii functions for z/OS UNIX System Services”

http://www-03.ibm.com/systems/z/os/zos/features/unix/libascii.html

Overview
The libascii functions are integrated into the base of the Language Environment. They help you port ASCII-based C applications to the EBCDIC-based z/OS UNIX environment.
From
Nemo <XXXXXXXXXXXXXXXX>
To
Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
Subject
Re: C compilers and non-ASCII systems
Date
Tue, 31 Jan 2012 13:32:06 -0500

z/OS is not only POSIX, it is UNIX (see http://www.opengroup.org/openbrand/register/brand3470.htm).

Oh.

Well then …

I figure I would then try Mark's suggestion (and several other people on the mailing list suggested the same thing) and at least time the change to see if it's a worthwhile change for such odd-looking, but legal, C code.

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: C89, const correctness, assertive, system calls, full buffering */
/*	  lookup table */

#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <assert.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

#define LINESIZE	16

/********************************************************************/

extern const char *sys_errlist[];
extern int         sys_nerr;

static void	do_dump		(const int,const int);
static size_t	dump_line	(char **const,unsigned char *,size_t,const unsigned long);
static void	hexout		(char *const,unsigned long,size_t,const int);
static void	myperror	(const char *const);
static size_t	myread		(const int,char *,size_t);
static void	mywrite		(const int,const char *const,const size_t);

/********************************************************************/

int main(const int argc,const char *const argv[])
{
  if (argc == 1)
    do_dump(STDIN_FILENO,STDOUT_FILENO);
  else
  {
    int i;
    
    for (i = 1 ; i < argc ; i++)
    {
      int fhin;
      
      fhin = open(argv[i],O_RDONLY);
      if (fhin == -1)
      {
        myperror(argv[i]);
        continue;
      }
      
      mywrite(STDOUT_FILENO,"-----",5);
      mywrite(STDOUT_FILENO,argv[i],strlen(argv[i]));
      mywrite(STDOUT_FILENO,"-----\n",6);
      
      do_dump(fhin,STDOUT_FILENO);
      if (close(fhin) < 0)
        myperror(argv[i]);
    }
  }
  
  return EXIT_SUCCESS;
}
      
/************************************************************************/     

static void do_dump(const int fhin,const int fhout)
{
  unsigned char  buffer[4096];
  char           outbuffer[75 * 109];
  char          *pout;
  unsigned long  off;
  size_t         bytes;
  size_t         count;
  
  assert(fhin  >= 0);
  assert(fhout >= 0);

  memset(outbuffer,' ',sizeof(outbuffer));
  off      = 0;
  count    = 0;
  pout     = outbuffer;
  
  while((bytes = myread(fhin,(char *)buffer,sizeof(buffer))) > 0)
  {
    unsigned char *p = buffer;
    
    for (p = buffer ; bytes > 0 ; )
    {
      size_t amount;
      
      amount    = dump_line(&pout,p,bytes,off);
      p        += amount;
      bytes    -= amount;
      off      += amount;
      count++;
      
      if (count == 109)
      {
        mywrite(fhout,outbuffer,(size_t)(pout - outbuffer));
        memset(outbuffer,' ',sizeof(outbuffer));
        count    = 0;
        pout     = outbuffer;
      }      
    }
  }
  
  if ((size_t)(pout - outbuffer) > 0)
    mywrite(fhout,outbuffer,(size_t)(pout - outbuffer));
}

/********************************************************************/

static size_t dump_line(
	char                **const pline,
	unsigned char              *p,
	size_t                      bytes,
	const unsigned long         off
)
{
  char   *line;
  char   *dh;
  char   *da;
  size_t  count;
  
  assert(pline  != NULL);
  assert(*pline != NULL);
  assert(p      != NULL);
  assert(bytes  >  0);
  
  line = *pline;
  
  hexout(line,off,8,':');
  if (bytes > LINESIZE)
    bytes = LINESIZE;
  
  p  += bytes;
  dh  = &line[10 + bytes * 3];
  da  = &line[58 + bytes];
  
  for (count = 0 ; count < bytes ; count++)
  {
    p  --;
    da --;
    dh -= 3;
    
    if ((*p >= ' ') && (*p <= '~'))
      *da = *p;
    else
      *da = '.';
    
    hexout(dh,(unsigned long)*p,2,' ');
  }
  
  line[58 + count] = '\n';
  *pline = &line[59 + count];
  return count;
}

/**********************************************************************/  

static void hexout(char *const dest,unsigned long value,size_t size,const int padding)
{
  assert(dest != NULL);
  assert(size >  0);
  assert((padding >= ' ') && (padding <= '~'));
  
  dest[size] = padding;
  while(size--)
  {
    dest[size] = "0123456789ABCDEF"[value & 0x0f];
    value >>= 4;
  }
}

/************************************************************************/

static void myperror(const char *const s)
{
  int err = errno;
  
  assert(s != NULL);
  
  mywrite(STDERR_FILENO,s,strlen(s));
  mywrite(STDERR_FILENO,": ",2);
  
  if (err > sys_nerr)
    mywrite(STDERR_FILENO,"(unknown)",9);
  else
    mywrite(STDERR_FILENO,sys_errlist[err],strlen(sys_errlist[err]));
  mywrite(STDERR_FILENO,"\n",1);
}

/************************************************************************/

static size_t myread(const int fh,char *buf,size_t size)
{
  size_t amount = 0;
  
  assert(fh   >= 0);
  assert(buf  != NULL);
  assert(size >  0);
  
  while(size > 0)
  {
    ssize_t bytes;
    
    bytes = read(fh,buf,size);
    if (bytes < 0)
    {
      myperror("read()");
      exit(EXIT_FAILURE);
    }
    if (bytes == 0)
      break;
    
    amount += bytes;
    size   -= bytes;
    buf    += bytes;
  }
  
  return amount;
}

/*********************************************************************/  
  
static void mywrite(const int fh,const char *const msg,const size_t size)
{
  assert(fh   >= 0);
  assert(msg  != NULL);
  assert(size >  0);
  
  if (write(fh,msg,size) < (ssize_t)size)
  {
    if (fh != STDERR_FILENO)
      myperror("output");
      
    exit(EXIT_FAILURE);
  }
}

/***********************************************************************/

It can't be that much faster, can it?

[spc]lucy:~/projects/99/src>time ./22 ~/bin/firefox/libxul.so >/dev/null

real    0m0.468s
user    0m0.450s
sys     0m0.018s
[spc]lucy:~/projects/99/src>time ./23 ~/bin/firefox/libxul.so >/dev/null

real    0m0.257s
user    0m0.245s
sys     0m0.012s

Almost twice as fast as what I thought was the fastest version already.

Ouch.

Several people (including Mark) mentioned that on modern CPUs, a branch instruction is like hitting a brick wall.

Yes, it's quite apparent that that is true.

But this does give me an idea for removing one more brick wall branch point …

Wednesday, February 01, 2012

Not quite full service

We took pity on Edvard this month. We invited him for cake.

[You scream, I scream, we all scream for ice cream]

Sadly, there was no ice cream.


“You mean there are worse programmers than rabid howler monkeys on crack?”

I broke The Protocol Stack From Hell™. Again. It's a common occurance whenever I attempt to run a load test (nominally against our own code, but it has to run through The Protocol Stack From Hell™ and well, The Protocol Stack From Hell™ just tends to crumple). It's not fatal, just a severe annoyance at having to restart everything and hope it all comes back up.

I talked to R about this, seeing how he has over twenty years of experience with telephony protocols. I mentioned just how bad The Protocol Stack From Hell™ is, and ask if there was anything better.

I was informed that most of the major telephony players, like AT&T, wrote their own stack, but there do exist two commercial offerings, one being The Protocol Stack From Hell™ that I keep going on and on about. The other one …

R said that the other one is not only more expensive, but it's worse!

The stack we're using, the one written by rabid howler monkeys on crack, is the better of the two.

[My head asplode]

99 ways to program a hex, Part 24: more lookup tables

So we went from a character encoding specific version to a character encoding agnostic version to today's version—another character encoding specific version (ASCII to be exact). But today's version also eliminates a branch point in the code, using a 256-element string to pick which character to display as part of the hexidecimal dump.

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: C89, const correctness, assertive, system calls, full buffering */
/*	  lookup tables */

#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <assert.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

#define LINESIZE	16

/********************************************************************/

extern const char *sys_errlist[];
extern int         sys_nerr;

static void	do_dump		(const int,const int);
static size_t	dump_line	(char **const,unsigned char *,size_t,const unsigned long);
static void	hexout		(char *const,unsigned long,size_t,const int);
static void	myperror	(const char *const);
static size_t	myread		(const int,char *,size_t);
static void	mywrite		(const int,const char *const,const size_t);

/********************************************************************/

int main(const int argc,const char *const argv[])
{
  if (argc == 1)
    do_dump(STDIN_FILENO,STDOUT_FILENO);
  else
  {
    int i;
    
    for (i = 1 ; i < argc ; i++)
    {
      int fhin;
      
      fhin = open(argv[i],O_RDONLY);
      if (fhin == -1)
      {
        myperror(argv[i]);
        continue;
      }
      
      mywrite(STDOUT_FILENO,"-----",5);
      mywrite(STDOUT_FILENO,argv[i],strlen(argv[i]));
      mywrite(STDOUT_FILENO,"-----\n",6);
      
      do_dump(fhin,STDOUT_FILENO);
      if (close(fhin) < 0)
        myperror(argv[i]);
    }
  }
  
  return EXIT_SUCCESS;
}
      
/************************************************************************/     

static void do_dump(const int fhin,const int fhout)
{
  unsigned char  buffer[4096];
  char           outbuffer[75 * 109];
  char          *pout;
  unsigned long  off;
  size_t         bytes;
  size_t         count;
  
  assert(fhin  >= 0);
  assert(fhout >= 0);

  memset(outbuffer,' ',sizeof(outbuffer));
  off      = 0;
  count    = 0;
  pout     = outbuffer;
  
  while((bytes = myread(fhin,(char *)buffer,sizeof(buffer))) > 0)
  {
    unsigned char *p = buffer;
    
    for (p = buffer ; bytes > 0 ; )
    {
      size_t amount;
      
      amount    = dump_line(&pout,p,bytes,off);
      p        += amount;
      bytes    -= amount;
      off      += amount;
      count++;
      
      if (count == 109)
      {
        mywrite(fhout,outbuffer,(size_t)(pout - outbuffer));
        memset(outbuffer,' ',sizeof(outbuffer));
        count    = 0;
        pout     = outbuffer;
      }      
    }
  }
  
  if ((size_t)(pout - outbuffer) > 0)
    mywrite(fhout,outbuffer,(size_t)(pout - outbuffer));
}

/********************************************************************/

static size_t dump_line(
	char                **const pline,
	unsigned char              *p,
	size_t                      bytes,
	const unsigned long         off
)
{
  char   *line;
  char   *dh;
  char   *da;
  size_t  count;
  
  assert(pline  != NULL);
  assert(*pline != NULL);
  assert(p      != NULL);
  assert(bytes  >  0);
  
  line = *pline;
  
  hexout(line,off,8,':');
  if (bytes > LINESIZE)
    bytes = LINESIZE;
  
  p  += bytes;
  dh  = &line[10 + bytes * 3];
  da  = &line[58 + bytes];
  
  for (count = 0 ; count < bytes ; count++)
  {
    p  --;
    da --;
    dh -= 3;
    
    *da = "................................ !\"#$%&'()*+,-./0123456789:;<=>?"
	"@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~."
	"................................................................"
	"........................................................"
	"........"[*p];
    
    hexout(dh,(unsigned long)*p,2,' ');
  }
  
  line[58 + count] = '\n';
  *pline = &line[59 + count];
  return count;
}

/**********************************************************************/  

static void hexout(char *const dest,unsigned long value,size_t size,const int padding)
{
  assert(dest != NULL);
  assert(size >  0);
  assert((padding >= ' ') && (padding <= '~'));
  
  dest[size] = padding;
  while(size--)
  {
    dest[size] = "0123456789ABCDEF"[value & 0x0f];
    value >>= 4;
  }
}

/************************************************************************/

static void myperror(const char *const s)
{
  int err = errno;
  
  assert(s != NULL);
  
  mywrite(STDERR_FILENO,s,strlen(s));
  mywrite(STDERR_FILENO,": ",2);
  
  if (err > sys_nerr)
    mywrite(STDERR_FILENO,"(unknown)",9);
  else
    mywrite(STDERR_FILENO,sys_errlist[err],strlen(sys_errlist[err]));
  mywrite(STDERR_FILENO,"\n",1);
}

/************************************************************************/

static size_t myread(const int fh,char *buf,size_t size)
{
  size_t amount = 0;
  
  assert(fh   >= 0);
  assert(buf  != NULL);
  assert(size >  0);
  
  while(size > 0)
  {
    ssize_t bytes;
    
    bytes = read(fh,buf,size);
    if (bytes < 0)
    {
      myperror("read()");
      exit(EXIT_FAILURE);
    }
    if (bytes == 0)
      break;
    
    amount += bytes;
    size   -= bytes;
    buf    += bytes;
  }
  
  return amount;
}

/*********************************************************************/  
  
static void mywrite(const int fh,const char *const msg,const size_t size)
{
  assert(fh   >= 0);
  assert(msg  != NULL);
  assert(size >  0);
  
  if (write(fh,msg,size) < (ssize_t)size)
  {
    if (fh != STDERR_FILENO)
      myperror("output");
      
    exit(EXIT_FAILURE);
  }
}

/***********************************************************************/

And it is faster:

[spc]lucy:~/projects/99/src>time ./23 ~/bin/firefox/libxul.so >/dev/null

real    0m0.258s
user    0m0.247s
sys     0m0.011s
[spc]lucy:~/projects/99/src>time ./24 ~/bin/firefox/libxul.so >/dev/null

real    0m0.186s
user    0m0.178s
sys     0m0.008s

About 1.3 times faster, but it is faster.

The conversion string is fixed, but that doesn't preclude a port to, say, an EBCIDIC system from using a different one, or the string being constructed at run time. The runtime generation would be more portable, but to me, that's wasted time spent generating a string that will always be the same (and frankly, if we're using this hack for speed, that's just wasted time).

Perhaps better might be several such strings, ASCII, EBCIDIC, Baudot, PETSCII and select via a command line option which one to use (defaulting to whatever character set is native for the platform the program is running on). It could be a useful thing.

But such a modification I'm leaving as an exercise for the reader.

Now, is this the fastest version possible? I'm not going to say yes this time. There might be something else that could be done to wring that last bit of performance out of this code, but at this point, I am definitely done with wringing out the speed.

I think.

Thursday, February 02, 2012

99 ways to program a hex, Part 25: C♯

Jeff Cuscutis sent in a version written in C♯ (C-sharp, in case you don't have a font with the sharp symbol). He assured me the code works, but I can't test it as I don't use Microsoft Windows; nor have I installed Mono, as I don't really have a need to interoperate with the Microsoft Windows environment (at home, or at The Corporation).

// *************************************************************************
//
// Copyright 2012 by Jeff Cuscutis.  All Rights Reserved.
//
// This program is free software; you can redistribute it and/or
// modify it under the terms of the GNU General Public License
// as published by the Free Software Foundation; either version 2
// of the License, or (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
// GNU General Public License for more details.
// 
// You should have received a copy of the GNU General Public License
// along with this program; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
//
// Comments, questions and criticisms can be sent to: sean@conman.org
//
// ***********************************************************************

// C#

using System;
using System.IO;
using System.Text;

namespace Hex
{
    class Program
    {
        static void Main(string[] args)
        {
            if (args.Length == 0)
            {
                DoDump(Console.In, Console.Out);
            }
            else
            {
                foreach (var fileName in args)
                {
                    try
                    {
                        using (var file = new FileStream(fileName, FileMode.Open, FileAccess.Read))
                        {
                            TextReader tr = new StreamReader(file, Encoding.ASCII);
                            Console.Out.WriteLine("-----{0}-----",fileName);
                            DoDump(tr, Console.Out);
                            file.Close();
                        }
                    }
                    catch (Exception e)
                    {
                        Console.Error.WriteLine(e.Message);
                    }
                    
                }
            }
        }

        static void DoDump(TextReader inFile, TextWriter outFile)
        {
            const int blockLength = 16;
            int actuallyRead;
            var buf = new char[blockLength];
            var offset = 0;

            while ((actuallyRead = inFile.Read(buf, 0, blockLength)) > 0)
            {
                var display = new char[blockLength+1];

                outFile.Write("{0:X8} ",offset);

                var j = 0;
                do
                {
                    outFile.Write("{0:X2} ", (byte)buf[j]);
                    if (!char.IsControl(buf[j]))
                        display[j] = buf[j];
                    else
                        display[j] = '.';
                    offset++;
                    j++;
                    actuallyRead--;
                } while ((j < blockLength) && (actuallyRead > 0));
                display[blockLength] = '\0';

                if (j < blockLength)
                    for (var i = j; i < blockLength; i++) outFile.Write("   ");

                outFile.WriteLine(display);

                outFile.Flush();
            }
        }
    }
}

About the only question I have about this version is that it appears to open the input file in text mode, and the hex dump program should work on any type of file, text or binary.

Friday, February 03, 2012

99 ways to program a hex, part 26: C89, system calls and mmap()

I still stand by what I said in part 24:

Now, is this the fastest version possible? I'm not going to say yes this time. There might be something else that could be done to wring that last bit of performance out of this code, but at this point, I am definitely done with wringing out the speed.

That didn't prevent Dave Täht from sending in a patch to part 24 that used mmap() (a system call that does some magic to make a file suddenly appear in memory) which did better on his system, although it was a percentage gain, not an order of magnitude gain.

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: C89, const correctness, assertive, system calls, full buffering */
/*	  lookup tables, mmap() */

#define _GNU_SOURCE

#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <assert.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>

#define LINESIZE	16

/********************************************************************/

extern const char *sys_errlist[];
extern int         sys_nerr;

static void	do_dump		(const int,const int);
static size_t	do_dump_memory	(unsigned char *,size_t,size_t,const int);
static size_t	dump_line	(char **const,unsigned char *,size_t,const unsigned long);
static void	hexout		(char *const,unsigned long,size_t,const int);
static void	myperror	(const char *const);
static size_t	myread		(const int,char *,size_t);
static void	mywrite		(const int,const char *const,const size_t);

/********************************************************************/

int main(const int argc,const char *const argv[])
{
  if (argc == 1)
    do_dump(STDIN_FILENO,STDOUT_FILENO);
  else
  {
    int i;
    
    for (i = 1 ; i < argc ; i++)
    {
      int fhin;
      
      fhin = open(argv[i],O_RDONLY);
      if (fhin == -1)
      {
        myperror(argv[i]);
        continue;
      }
      
      mywrite(STDOUT_FILENO,"-----",5);
      mywrite(STDOUT_FILENO,argv[i],strlen(argv[i]));
      mywrite(STDOUT_FILENO,"-----\n",6);
      
      do_dump(fhin,STDOUT_FILENO);
      if (close(fhin) < 0)
        myperror(argv[i]);
    }
  }
  
  return EXIT_SUCCESS;
}
      
/************************************************************************/     

static void do_dump(const int fhin,const int fhout)
{
  struct stat info;
  
  assert(fhin  >= 0);
  assert(fhout >= 0);
  
  if (fstat(fhin,&info) < 0)
    myperror("fstat()");
  
  if (!S_ISREG(info.st_mode))
  {
    unsigned char buffer[4096];
    size_t        bytes;
    size_t        off = 0;
    
    while((bytes = myread(fhin,(char *)buffer,sizeof(buffer))) > 0)
      off = do_dump_memory(buffer,bytes,off,fhout);
  }
  else
  {
    unsigned char *buffer;
    
    buffer = mmap(NULL,info.st_size,PROT_READ,MAP_SHARED,fhin,0);
    if (buffer == MAP_FAILED)
      myperror("mmap()");
    
    if (madvise(buffer,info.st_size,MADV_SEQUENTIAL | MADV_WILLNEED) < 0)
      myperror("madvise()");
    
    do_dump_memory(buffer,info.st_size,0,fhout);
    munmap(buffer,info.st_size);
  }
}

/********************************************************************/

static size_t do_dump_memory(
	unsigned char *p,
	size_t         bytes,
	size_t         off,
	const int      fhout
)
{
  char    outbuffer[75 * 109];
  char   *pout;
  size_t  count;
  
  assert(p     != NULL);
  assert(fhout >= 0);
  
  memset(outbuffer,' ',sizeof(outbuffer));
  count = 0;
  pout  = outbuffer;
  
  while(bytes)
  {
    size_t amount;
    
    amount = dump_line(&pout,p,bytes,off);
    p     += amount;
    bytes -= amount;
    off   += amount;
    count++;
  
    if (count == 109)
    {
      mywrite(fhout,outbuffer,(size_t)(pout - outbuffer));
      memset(outbuffer,' ',sizeof(outbuffer));
      count = 0;
      pout  = outbuffer;
    }
  }
  
  if ((size_t)(pout - outbuffer) > 0)
    mywrite(fhout,outbuffer,(size_t)(pout - outbuffer));
  return off;
}

/******************************************************************/  

static size_t dump_line(
	char                **const pline,
	unsigned char              *p,
	size_t                      bytes,
	const unsigned long         off
)
{
  char   *line;
  char   *dh;
  char   *da;
  size_t  count;
  
  assert(pline  != NULL);
  assert(*pline != NULL);
  assert(p      != NULL);
  assert(bytes  >  0);
  
  line = *pline;
  
  hexout(line,off,8,':');
  if (bytes > LINESIZE)
    bytes = LINESIZE;
  
  p  += bytes;
  dh  = &line[10 + bytes * 3];
  da  = &line[58 + bytes];
  
  for (count = 0 ; count < bytes ; count++)
  {
    p  --;
    da --;
    dh -= 3;
    
    *da = "................................ !\"#$%&'()*+,-./0123456789:;<=>?"
	"@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~."
	"................................................................"
	"........................................................"
	"........"[*p];
    
    hexout(dh,(unsigned long)*p,2,' ');
  }
  
  line[58 + count] = '\n';
  *pline = &line[59 + count];
  return count;
}

/**********************************************************************/  

static void hexout(char *const dest,unsigned long value,size_t size,const int padding)
{
  assert(dest != NULL);
  assert(size >  0);
  assert((padding >= ' ') && (padding <= '~'));
  
  dest[size] = padding;
  while(size--)
  {
    dest[size] = "0123456789ABCDEF"[value & 0x0f];
    value >>= 4;
  }
}

/************************************************************************/

static void myperror(const char *const s)
{
  int err = errno;
  
  assert(s != NULL);
  
  mywrite(STDERR_FILENO,s,strlen(s));
  mywrite(STDERR_FILENO,": ",2);
  
  if (err > sys_nerr)
    mywrite(STDERR_FILENO,"(unknown)",9);
  else
    mywrite(STDERR_FILENO,sys_errlist[err],strlen(sys_errlist[err]));
  mywrite(STDERR_FILENO,"\n",1);
}

/************************************************************************/

static size_t myread(const int fh,char *buf,size_t size)
{
  size_t amount = 0;
  
  assert(fh   >= 0);
  assert(buf  != NULL);
  assert(size >  0);
  
  while(size > 0)
  {
    ssize_t bytes;
    
    bytes = read(fh,buf,size);
    if (bytes < 0)
    {
      myperror("read()");
      exit(EXIT_FAILURE);
    }
    if (bytes == 0)
      break;
    
    amount += bytes;
    size   -= bytes;
    buf    += bytes;
  }
  
  return amount;
}

/*********************************************************************/  
  
static void mywrite(const int fh,const char *const msg,const size_t size)
{
  assert(fh   >= 0);
  assert(msg  != NULL);
  assert(size >  0);
  
  if (write(fh,msg,size) < (ssize_t)size)
  {
    if (fh != STDERR_FILENO)
      myperror("output");
      
    exit(EXIT_FAILURE);
  }
}

/***********************************************************************/

I tried it on my system, and saw no difference in performance whatsoever. But Dave was using a 64-bit system, and I was using a 32-bit system. Okay, there could be a difference there. I then tried it on a 64-bit system (The Corporation provided laptop, running a 64-bit version of Linux) and there was a difference, but well:

[spc]saltmine:~/source/99>time ./24 libxul.so >/dev/null

real    0m0.043s
user    0m0.030s
sys     0m0.010s
[spc]saltmine:~/source/99>time ./26 libxul.so >/dev/null

real    0m0.054s
user    0m0.040s
sys     0m0.010s

The version with mmap() is slower! It's more noticeable with a large (759,012,536 bytes) file:

[spc]saltmine:~/source/99>time ./24 largedata >/dev/null

real    0m1.682s
user    0m1.500s
sys     0m0.170s
[spc]saltmine:~/source/99>time ./26 largedata >/dev/null

real    0m1.809s
user    0m1.680s
sys     0m0.120s

Yes, time spent in the kernel goes down (understandable, since we no longer have to copy data out of the file through the kernel) but the overall time goes up. And at this point, we've reached the point of diminishing returns, where the amount of return does not justify the amount of effort. It could be that Dave's machine had more memory than my machine, or a faster harddrive, or a later version of the kernel that handled mmap()/madvise() better. There's no real gains at this point to be had.

Another interesting thing is to run the time command, but with more verbose output:

[spc]saltmine:~/source/99>time -v ./24 largedata >/dev/null
        Command being timed: "./24 largedata"
        User time (seconds): 1.47
        System time (seconds): 0.22
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.69
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1600
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 139
        Voluntary context switches: 1
        Involuntary context switches: 171
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Here we see the code from part 24 taking 139 page faults. Now, today's version:

[spc]saltmine:~/source/99>time -v ./26 largedata >/dev/null
        Command being timed: "./26 largedata"
        User time (seconds): 1.71
        System time (seconds): 0.10
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.82
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2965440
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 185449
        Voluntary context switches: 1
        Involuntary context switches: 182
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

The number of page faults skyrockets to 185,449, three orders of magnitude more than the previous version, and thus, that could account for the time loss on my 64-bit system (quite possibly this does show a sub-optimal implementation of mmap()).

Your milage may vary, though.

Saturday, February 04, 2012

99 ways to program a hex, Part 27: C♯, binary stream

Jeff Cuscutis sent in another version of the C♯ program. He writes:

From
Jeffrey Cuscutis <XXXXXXXXXXXXXXXXXXXXXXX>
To
Sean Conner <sean@conman.org>
Subject
Re: 99 Programs
Date
Sat, 4 Feb 2012 22:51:37 -0500

Modified to use BinaryReader instead of TextReader. It sort of worked on binary files before this, but replaced unprintable characters with “?” automatically when read.

It now does this correctly, but I had to make a wrapper function to read data from the Console as that is a TextReader.

// *************************************************************************
//
// Copyright 2012 by Jeff Cuscutis.  All Rights Reserved.
//
// This program is free software; you can redistribute it and/or
// modify it under the terms of the GNU General Public License
// as published by the Free Software Foundation; either version 2
// of the License, or (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
// GNU General Public License for more details.
// 
// You should have received a copy of the GNU General Public License
// along with this program; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
//
// Comments, questions and criticisms can be sent to: sean@conman.org
//
// ***********************************************************************

// C#, binary stream

using System;
using System.IO;

namespace Hex
{
    class Program
    {
        static void Main(string[] args)
        {
            if (args.Length == 0)
            {
                DoDump(ReadFromConsole, Console.Out);
            }
            else
            {
                foreach (var fileName in args)
                {
                    try
                    {
                        using (var file = new FileStream(fileName, FileMode.Open, FileAccess.Read))
                        {
                            var tr = new BinaryReader(file);
                            Console.Out.WriteLine("-----{0}-----",fileName);
                            DoDump(tr.Read, Console.Out);
                            file.Close();
                        }
                    }
                    catch (Exception e)
                    {
                        Console.Error.WriteLine(e.Message);
                    }
                    
                }
            }
        }

        // wrapper to fake reading from a TextReader to 
        // make it look like it is a BinaryReader
        static int ReadFromConsole(byte[] buf, int index, int count)
        {
            var charBuf = new char[count];

            int actuallyRead = Console.In.Read(charBuf, index, count);

            for (int i = 0; i < charBuf.Length; i++)
            {
                buf[i] = (byte)charBuf[i];
            }

            return actuallyRead;
        }

        static void DoDump(Func<byte[], int, int, int> readFunc, TextWriter outFile)
        {
            const int blockLength = 16;
            int actuallyRead;
            var buf = new byte[blockLength];
            var offset = 0;

            while ((actuallyRead = readFunc(buf, 0, blockLength)) > 0)
            {
                var display = new char[blockLength+1];

                outFile.Write("{0:X8} ",offset);

                var j = 0;
                do
                {
                    outFile.Write("{0:X2} ", buf[j]);
                    if (!char.IsControl((char)buf[j]))
                        display[j] = (char)buf[j];
                    else
                        display[j] = '.';
                    offset++;
                    j++;
                    actuallyRead--;
                } while ((j < blockLength) && (actuallyRead > 0));
                display[blockLength] = '\0';

                if (j < blockLength)
                    for (var i = j; i < blockLength; i++) outFile.Write("   ");

                outFile.WriteLine(display);

                outFile.Flush();
            }
        }
    }
}

Update on Monday, February 6th, 2012

Jeff wrote me to add:

I forgot to mention that it also uses a generic signature to handle the readFunc parameter in DoDump()

Sunday, February 05, 2012

99 ways to program a hex, Part 28: K&R C, system calls, full buffering

So, how would the version based on system calls have looked in the 80s? You know, probably before the mmap() system call existed? Probably like this, vowel impairments, sorry, vwlmprmnts and all.

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: K&R, system calls, full buffering */

#include <stdlib.h>
#include <string.h>
#include <errno.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

#define LINESIZE	16

/********************************************************************/

main(argc,argv)
char **argv;
{
	int i,fhin;

	if (argc == 1) {
		hexdmp(0,1);
	} else {
		for (i = 1 ; i < argc ; i++) {
			fhin = open(argv[i],O_RDONLY);
			if (fhin == -1) {
				myperr(argv[i]);
				continue;
			}

			mywrt(1,"-----",5);
			mywrt(1,argv[i],strlen(argv[i]));
			mywrt(1,"-----\n",6);
      
			hexdmp(fhin,1);
			if (close(fhin) < 0) {
				myperr(argv[i]);
			}
		}
	}

	return 0;
}

/************************************************************************/     

char buffer[4096],outbuf[75 * 109];

hexdmp(fhin,fhout)
{
	int off,bytes,count,amount;
	char *pout,*p;

	memset(outbuf,' ',sizeof(outbuf));
	off = count = 0;
	pout = outbuf;

	while((bytes = myread(fhin,(char *)buffer,sizeof(buffer))) > 0) {
		p = buffer;
		for (p = buffer ; bytes > 0 ; ) {
			amount = hexln(&pout,p,bytes,off);
			p += amount;
			bytes -= amount;
			off += amount;
			count++;

			if (count == 109) {
				mywrt(fhout,outbuf,pout - outbuf);
				memset(outbuf,' ',sizeof(outbuf));
				count = 0;
				pout = outbuf;
			}      
		}
	}

	if (pout - outbuf > 0) {
		mywrt(fhout,outbuf,pout - outbuf);
	}
}

/********************************************************************/

hexln(pline,p,bytes,off)
char **pline,*p;
{
	char *line,*dh,*da;
	int count;
  
	line = *pline;
  
	hexout(line,off,8,':');
	if (bytes > LINESIZE) {
		bytes = LINESIZE;
  	}
	
	p += bytes;
	dh = &line[10 + bytes * 3];
	da = &line[58 + bytes];

	for (count = 0 ; count < bytes ; count++) {
		p  --;
		da --;
		dh -= 3;
    
		if ((*p >= ' ') && (*p <= '~')) {
			*da = *p;
		} else {
			*da = '.';
		}

		hexout(dh,(unsigned long)*p,2,' ');
	}

	line[58 + count] = '\n';
	*pline = &line[59 + count];
	return count;
}

/**********************************************************************/  

hexout(dest,value,size,padding)
char *dest;
{
	dest[size] = padding;
	while(size--) {
		dest[size] = (char)((value & 0x0F) + '0');
		if (dest[size] > '9') {
			dest[size] += 7;
		}
		value >>= 4;
	}
}

/************************************************************************/

myperr(s)
char *s;
{
	extern char **sys_errlist;
	extern int sys_nerr;
	int err = errno;

	mywrt(2,s,strlen(s));
	mywrt(2,": ",2);

	if (err > sys_nerr) {
		mywrt(2,"(unknown)",9);
	} else {
		mywrt(2,sys_errlist[err],strlen(sys_errlist[err]));
	}
	mywrt(2,"\n",1);
}

/************************************************************************/

myread(fh,buf,size)
char *buf;
{
	int amount = 0,bytes;

	while(size > 0) {
		bytes = read(fh,buf,size);
		if (bytes < 0) {
			myperr("read()");
			exit(1);
		}
		if (bytes == 0) {
			break;
		}    
		amount += bytes;
		size -= bytes;
		buf += bytes;
	}
	return amount;
}

/*********************************************************************/  

mywrt(fh,msg,size)
char *msg;  
{
	if (write(fh,msg,size) < size) {
		if (fh != 2) {
			myperr("output");
		}
		exit(1);
	}
}

/***********************************************************************/

Actually, the vowel impairment vwlmprmnt code was due to linker strictions at the time—linkers at the time were fairly limited, and one of the limits was the length of identifiers it could handle, a limit of around 6 characters (some might have handled more, but the first C standard in 1989 set the limit to six, so that's probably the smallest size at the time). With only six characters (makes you wonder where that limit comes from) and vowels typically being redundant (“f y cn rd ths y t cn wrt prgrms”) is it any wonder early code was typically vwlmprd?


I can't quite put my finger on it

I can't quite shake the feeling that this commercial is ripping something off. What, I don't know … but it's something …

Anybody? Anybody?

Monday, February 06, 2012

99 ways to program a hex, Part 29: K&R, system calls, full buffering, obfuscated

I suspect that many entries in the IOCCC start out as normal, are converted to K&R as a first step, then rename all variables and functions to one or two character names and unneeded spaces removed.

Much like today's version.

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: K&R, system calls, full buffering, obfuscated */

#include <errno.h>
#include <fcntl.h>

main(a,b)char**b;{int i,f;if(a==1)fa(0,1);else{for(i=1;i<a;i++){f=open
(b[i],O_RDONLY);if(f==-1){fd(b[i]);continue;}ff(1,"-----",5);ff(1,b[i],
strlen(b[i]));ff(1,"-----\n",6);fa(f,1);if(close(f)<0)fd(b[i]);}}return 0;}

char a[4096],b[75*109];
fa(c,d){int e,f,g,h;char*i,*p;memset(b,' ',sizeof(b));e=g=0;i=b;
while((f=fe(c,(char *)a,sizeof(a)))>0){p=a;for(p=a;f>0;){h=fb(&i,p,f,e);
p+=h;f-=h;e+=h;g++;if(g==109){ff(d,b,i-b);memset(b,' ',sizeof(b));g=0;
i=b;}}}if (i-b>0)ff(d,b,i-b);}

fb(a,p,c,d)char**a,*p;{char*e,*f,*g;int h;e=*a;fc(e,d,8,':');if(c>16)
{c=16;}p+=c;f=&e[10+c*3];g=&e[58+c];for(h=0;h<c;h++){p--;g--;f-=3;
if((*p>=' ')&&(*p<='~'))*g=*p;else*g = '.';fc(f,*p,2,' ');}e[58+h]='\n';
*a=&e[59+h];return h;}

fc(a,b,c,d)char*a;{a[c]=d;while(c--){a[c]=
(b&0x0F)+'0';if(a[c]>'9')a[c]+=7;b>>=4;}}

fd(a)char*a;{extern char**sys_errlist;extern int sys_nerr;int b=errno;
ff(2,a,strlen(a));ff(2,": ",2);if(b>sys_nerr){ff(2,"(unknown)",9);}else
{ff(2,sys_errlist[b],strlen(sys_errlist[b]));}ff(2,"\n",1);}

fe(a,b,c)char*b;{int d=0,e;while(c>0){e=read(a,b,c);if(e<0){fd("read()");
exit(1);}if(e==0){break;}d+=e;c-=e;b+=e;}return d;}

ff(a,b,c)char*b;{if(write(a,b,c)<c){if(a!=2){fd("output");}exit(1);}}

The sad thing—I've seen production code like this (and no, The Protocol Stack From Hell™ isn't this bad, thankfully).

Tuesday, February 07, 2012

99 ways to program a hex, Part 30: K&R, really obfuscated

And here we have a fully obfuscated version of our program—a nearly impenetrable wall of characters that nonetheless compiles and works.

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: K&R, system calls, full buffering, obfuscated 2 */

#include <errno.h>
#include <fcntl.h>

main(a,b)char **b;{int i,f;if(a==1)fa(0,1);else{for(i=1;i<a;i++){f=open(b[i],
O_RDONLY);if(f==-1){fd(b[i]);continue;}ff(1,"-----",5);ff(1,b[i],strlen(b[i])
);ff(1,"-----\n",6);fa(f,1);if(close(f)<0)fd(b[i]);}}return 0;}char a[4096],b
[75*109];fa(c,d){int e,f,g,h;char*i,*p;memset(b,' ',sizeof(b));e=g=0;i=b;while
((f=fe(c,(char *)a,sizeof(a)))>0){p=a;for(p=a;f>0;){h=fb(&i,p,f,e);p+=h;f-=h;
e+=h;g++;if(g==109){ff(d,b,i-b);memset(b,' ',sizeof(b));g=0;i=b;}}}if (i-b>0)
ff(d,b,i-b);}fb(a,p,c,d)char**a,*p;{char*e,*f,*g;int h;e=*a;fc(e,d,8,':');if(
c>16){c=16;}p+=c;f=&e[10+c*3];g=&e[58+c];for(h=0;h<c;h++){p--;g--;f-=3;if((*p
>=' ')&&(*p<='~'))*g=*p;else*g = '.';fc(f,*p,2,' ');}e[58+h]='\n';*a=&e[59+h]
;return h;}fc(a,b,c,d)char*a;{a[c]=d;while(c--){a[c]=(b&0x0F)+'0';if(a[c]>'9'
)a[c]+=7;b>>=4;}}fd(a)char*a;{extern char**sys_errlist;extern int sys_nerr;int
b=errno;ff(2,a,strlen(a));ff(2,": ",2);if(b>sys_nerr){ff(2,"(unknown)",9);}
else{ff(2,sys_errlist[b],strlen(sys_errlist[b]));}ff(2,"\n",1);}fe(a,b,c)char
*b;{int d=0,e;while(c>0){e=read(a,b,c);if(e<0){fd("read()");exit(1);}if(e==0)
{break;}d+=e;c-=e;b+=e;}return d;}ff(a,b,c)char*b;{if(write(a,b,c)<c){if(a!=2
){fd("output");}exit(1);}}

And because it's so obfuscated, it's mercifully short as well.

Wednesday, February 08, 2012

99 ways to program a hex, Part 31, has been delayed indefinitely

Yesterday's version is the last version I'll be posting for now. When I was initially inspired, I ripped through a majority of what you've seen in just three days. It's not really surprising given that a majority of the “variations” differed by a line or two of code.

But I've run out. And now, having done 21 variations in C (one more than I originally planned), five in Lua (I could do one more in Lua—the actual original code I based the Lua versions off of, but oddly enough, it doesn't actually handle files), two in a dialect of BASIC I can't currently test and two I didn't expect in C♯ (both submitted by Jeff Cuscutis), I don't think I have it in me to do many more.

I've exhausted C. And I pretty much exhausted Lua, which are my two “go to” languages these days. I could probably push out a couple of Perl versions, and a PHP version (PHP does not have nearly the expressiveness of Lua or even Perl to bother with more than one version) but that's about the limit.

There are a few other languages I could do (Common Lisp, Scheme, SNOBOL (seriously!), Forth, Awk, Erlang, Python and Ruby) but those would require significant time hitting up documentation and what not because I don't know those langauges all that well (if at all).

So I'll probably continue this series, but it'll probably be a post or two every few months and not every XXXXXXX day as I have been doing.

Thursday, February 16, 2012

“You don't really own your data, as much as we let you use it”

I made a comment recommending against using “the cloud” to store your data on GoogleFacePlusBook and someone took offense to that remark. I know, I know, but in my defense, we were both in the wrong, and in the end I hope we all learned something. I learned that “buying a book” is more “licensing to read” than actual ownership (even the dead tree type, and this from a lawyer I called (and if I knew his website, I would link to it here)) and the other person learned that yes, Virginia, you can successfully sue Amazon for having eaten your homework.

I still stand on my original remark, not to use “the cloud” to store your data. To present your data (like pictures, idiotic blog posts, what have you) to the public, sure, use “the cloud.” To store your data (or even a backup of your data)? Not on your life.

I do have my reasons and they range from the reasonable (it's not reliable, as even Google has bad hair days), the debatable (you have no control over your data as in the aforementioned Amazon eating your homework, sites going down with little to no notification) to the downright “wearing a tin hat in a shack in the woods” (actual remark by the other person, and here we go into government snooping through your data in “the cloud”—and if you think you are not a “person of interest” I'm sure Ted Kennedy never thought he would be on the “No Fly List”—ponder that for a while).

But it didn't occure to me that a company hosting “the cloud” could concievably mine your own data—I mean, it's there, right? And then I read this little gem of an article:

… Target has a baby-shower registry, and Pole started there, observing how shopping habits changed as a woman approached her due date, which women on the registry had willingly disclosed. He ran test after test, analyzing the data, and before long some useful patterns emerged. …

About a year after Pole created his pregnancy-prediction model, a man walked into a Target outside Minneapolis and demanded to see the manager. He was clutching coupons that had been sent to his daughter, and he was angry, according to an employee who participated in the conversation.

“My daughter got this in the mail!” he said. “She's still in high school, and you're sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?”

The manager didn't have any idea what the man was talking about. He looked at the mailer. Sure enough, it was addressed to the man's daughter and contained advertisements for maternity clothing, nursery furniture and pictures of smiling infants. The manager apologized and then called a few days later to apologize again.

On the phone, though, the father was somewhat abashed. “I had a talk with my daughter,” he said. “It turns out there's been some activities in my house I haven't been completely aware of. She's due in August. I owe you an apology.”

Via Hacker News, How Companies Learn Your Secrets

Okay, it's not about a company mining “the cloud,” but it does illustrate just how much data we willingly (or unknowingly) give out.

Update a few minutes later

Perhaps government overreach isn't quite as “tin hat crazy” as I thought …

Tuesday, February 21, 2012

Hey, as long as you paid taxes on the income, I don't see the IRS having any issues with this …

Okay, let me see if I have the pitch right—hypothetically speaking, let's say I have made, through many illegal means, a metric buttload of money and I want to have it laundered. I don't want to mess with the banking systems as they seem to be under high scrutiny these days (besides, everybody knows they're criminals anyway).

So, I grab a few hundred pages from Wikipedia, bundle them into a “book” which I “publish and sell” via Amazon. This costs me nearly nothing. Then I “buy” as many copies of this “book”, getting nearly 45% of the money as the “author.” Amazon won't complain, as they're getting a nice chunk of change. Heck, I could even get a higher percentage of the money if the “book” is bought through an affiliate program I set up (this might push my percentage over 50%).

And who's to say this isn't going on right now?

And you know, money laudering might explain some of these Etsy sites (link via Regretsy).

Monday, March 12, 2012

And I bet Han still shoots first!

His idea was to edit the Star Wars prequels into one movie, as they would provide him a lot of footage to work with. He used footage from all three prequels, a couple cuts from the original trilogy, some music from The Clone Wars television series, and even a dialogue bit from Anthony Daniels’ (C-3PO) audio book recordings. He even created a new opening text crawl to set up his version of the story.

The result is an 85-minute movie titled Star Wars: Episode III.5: The Editor Strikes Back. It should be noted that the Star Wars prequel trilogy is almost 7 hours in total length, and the shortest film (Episode 1) is more than 51 minutes longer than Grace’s fan cut. What this means is a lot of footage ended up on the editing room floor, and a lot of creative choices were made in the editing process. And the result? Topher Grace’s Star Wars film is probably the best possible edit of the Star Wars prequels given the footage released and available.

Via Jason Kottke, Topher Grace Edited The ‘Star Wars’ Prequels Into One 85- Minute Movie and We Saw It | /Film

Personally, I would love to see this version. Heck, I'd like to see any of the subsequent re-edits from anybody other than George “Despoiler of Childhood Memories” Lucas.

Heck, even the “It's not Star Wars as you know it” comic Darth & Droids (the conceit: it's a role playing game) delivers a much better and more coherent storyline than what George “I should have stuck to editing” Lucas ever came up with.

What? Me, bitter?

Wednesday, March 14, 2012

A million dollars! And all he draws are stick figures!

After years and years of giving his work away for free, Rich Burlew just raked in more than a million.

Via Hacker News, The Crowd- Funding Phenomenon Continues – Comic Raises $1.2M on Kickstarter (+Q&A with Creator Rich Burlew)! | Singularity Hub

Wow. Rich Burlew just raised over a million dollars from his fans. For drawing a comic based on stick figures.

Granted, it took him nine years to to get the fan base for this, but still, that's quite a payout. And the best bit is that he did it on his own terms without reliquishing his intellectual property (link via my very first post).

Saturday, May 12, 2012

Borg collective refugees? Oh, that's right, Borg collective refugees

From
Tom Morris <XXXXXXXXXXXXXXXXXXXXX>
To
sean@conman.org
Subject
Borg collective refugees?
Date
Sat, 12 May 2012 14:23:22 -0400

I got a great laugh out of this. Thank you for being so awesome.

—Tom Morris, chairman of the Miami Hamfest / Tropical Hamboree

I'm looking at this email, wondering what is this person talking about? Is this some type of spam? That's the entire email. But no, it turns out that Tom Morris is the Chairman of the Miami Hamfest and he was commenting on a comment I made eleven years ago (and it's been seven years since I last attended the Miami Hamfest).

But next year's show looks to be interesting, what with part of the emphasis on “alternative energy” and my interest in fringe science (I find it facinating, not necessarily because I believe in it but because I find it highly amusing). I'll have to check it out next year.

Friday, August 10, 2012

The magical spinning globe

“I got you a gift,” said Bunny.

“Oh really?”

“Yes. It contains a large magnet, a sealed container filled with liquid, and solar cells.”

“Hmmm … ”

“It's also something you collect.”

So I'm trying to think of something that I collect that has magnets, sealed containers with liquid and some electronics. “It's not a Lego Mindstorm kit, is it?”

“No.”

“Can I see the box?”

“Sure,” said Bunny. She then handed over a six inch cube box to me. It was quite heavy for its size. Blue colored, with “Mova” written on the side.

“I still can't tell what it is.”

She opens the box, and pulled out a clear disk about 2½ inches across and about an inch thick. There were three holes on one side. She also pulled out a few clear rods, also about 2½ inches long and ¼ inch across. They fit into the hole to form what looked like a stand. “Interesting,” I said.

She then pulled out a plastic bag, about five inches across. “Here you go,” she said, handing the bag over to me.

I take it, and pull out a beautiful globe.

[The Mova Globe]

The globe itself is encased in a clear shell filled with a special liquid that allows the globe inside to spin. The solar cells are used to drive the globe to spin within the clear shell, by using ambient light as a power source; the magnet inside helps to drive the motion.

[Top of the world!]

It's a wonderful globe, and a wonderful gift that Bunny gave me.


I'm puzzling over all these globes …

In addition to the Mova Globe, Bunny also thought I would enjoy this globe:

[As if regular jigsaw puzzles were hard enough.]

A three dimensional 540 piece jigsaw puzzle of the Earth—an actual nine inch globe. Of course, that's the finished piece. This is what I was faced with when I started:

[A pile of pieces.]

It certainly didn't hurt that each piece was numbered on the back.

[To make sure you have every piece] [It's a numbers game ...]

It was quite fun, and it felt a bit like making the Death Star, in an odd way.

[I started with a bottom-up approach ... ] [The land was easy to put together, but the Earth is 75% water ... ] [So I decided to change to a top-down approach to things.] [The Earth is hollow!  Hollow I say!] [But I managed to puzzle this out.]

Less traffic and lost packets … I'm stumped

Part of my job at The Corporation is load testing. And generally, when I do load testing, I pretty much write code that sends requests as quickly as possible to the server (and I might run multiple programs that spam the server with requests). But recently, Smirk brought to my attention Poisson distributions, claiming that it presents a more realistic load.

I was skeptical. I really didn't see how randomly sending requests using a Poisson distribution would behave any differently than a (large) constant load, but Smirk insisted.

So I wrote a simple UDP service and ran that on one machine. I then coded up two client programs. They were actually identical except for one line of code—the “constant” client:

sleep(0)

and the “Poisson distribution” client:

sleep(-log(1.0 - random()) / 5000.0)

(sleep(0) doesn't actually sleep, but does cause the calling process to be rescheduled. This limits the process to about 10,000 messages a second, so it's a good baseline, and we can always run more processes. random() returns a value between 0 and 1 (includes 0, excludes 1) and we subtract that from 1.0 to prevent taking the log of 0 (which is undefined). Logarithms of numbers less than 1 are negative, so we negate to get a positive value, and divide by 5,000, which means we average 5,000 messages per second. Yes, it's half the rate of the constant client, but there are two reasons for this—one, we can always run more, and two, there's a limit to short we can sleep—a value of 0 just reschedules the process; under Solaris, you can't sleep less than 1/100 of a second (so values less than .01 are rounded up to .01); Linux is around 1/500 or 1/1000 of a second (depending upon configuration) so 5,000 is kind of a “eh, it's good enough” value)

(I should also mention that the version of sleep() I'm using can take a fractional number of seconds, like sleep(0.125), since all the code I'm talking about is in Lua, because I was writing a “proof-of-concept” server, not a “high performance” server)

So, I run 64 “constant” clients and get:

64 “constant” client results, packets per 10 second interval
packets sent packets received packets dropped
6360026360020
6210366210360
6318906318900
6310516310510
6139126139120

Pretty much around 10,000 messages per second with no dropped data. And now, for 128 ”Poisson distribution” clients:

128 ”Poisson distribution” clients, packets per 10 second interval
packets send packets received packets dropped
34862034855565
43903843898850
37548237543646
38265038260050
39688639682858

Um … what?

Half the number of packets, and I'm loosing some as well? What weirdness is this? No matter how many times I run the tests, or for how long, I get similar results. The “Poisson distribution” client gets horrible results.

And as Smirk said, that's exactly the point.

And the odd thing is, I can't explain this behavior. I can't comprehend what could be happening that could be causing this behavior, over one line change.

Disturbing.

Obligatory Picture

An abstract representation of where you're coming from]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer

No AI was used in the making of this site, unless otherwise noted.

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.