The Boston Diaries

Tuesday, January 31, 2012

Rabid howler monkeys on crack wrote this code

Okay, yes, there are issues with the code to The Protocol Stack From Hell™. There's the vowel impaired names—oh, sorry, the vwl_imprd_nms, the usual ignoring return codes and tons of global variables—sorry, glblvrbls littering the code. And there's stuff like:

foo_t *p = NULL;

/* lots of code not touching p at all */

if (p) {
	/* lots of code that will never be executed because */ 
	/* p is always, *always* NULL at this point         */
} else {
	/* this code will always *always* be executed */
	/* p is never touched otherwise */
}

/* p is still never used */

Yes. At one point p was probaby used, then a code change sometime during the Clinton Administration (late first term most likely) removed the need for p but later code still checked it, so in order to keep the code from crashing (during the last year of the Clinton Administration, most likely) the “easiest fix that would work with minimal code changes because we want to avoid a five day regression test” is to just NULL out the variable where declared and call it a day.

Odder yet is the code that generates a string, checks to see if the generated string ends with two newline characters and then adds one or two newline characters if required (and yes, it checks for the first newline character, then the second) and further down in the code, it checks to see if the line has two newline characters and carefully removes them, one at a time.

Yes. The code adds two characters, only to remove them later on.

Again, I can see the requirements late during the Reagan Administration bumping up against the requirements during the early Bush 43 Adminstration and again, the easiest way to handle this is a local change that distrubs as little code as possible.

Although, there is one bit that does smack of rabid howler monkeys on crack taking a pass at the code, which I briefly mention in passing. It's basically the Poster Child™ for why certain C programmers should be taken out back behind the shed and disembowled with a grapefruit spoon.

Back then, I was tasked with modifying some code to log the Protocol Stack From Hell™ errors via syslog(), and all I had to work with was a C source file:

/* tons o' broilerplate text whereby we pledge our first born to feed the
 * lawyers of The Protocol Stack From Hell™ */

void Stpd_Init_Fnctn(void)
{
  MYSTERIOUS_ENTRY_CODE_WE_CANT_TOUCH();
  /* modifications here */
  MYSTERIOUS_EXIT_CODE_WE_CANT_TOUCH();
}

void Stpd_Lrm_Fnctn(lrm_t *ptr,int wat)
{
  MYSTERIOUS_ENTRY_CODE_WE_CANT_TOUCH();
  /* modifications here */
  MYSTERIOUS_EXIT_CODE_WE_CANT_TOUCH();
}

/* Muahahahahahahahahahahahaha! */
/* [S/X thunder ] */

(No, seriously, each function starts and ends with MYSTERIOUS_something_CODE_WE_CANT_TOUCH()) and an object file, which the C code is linked against to produce the final program.

Okay, nothing that out of the ordinary. Only we weren't getting the proper error messages from the lrm_t … thingy … we were given. Some back and forth with The Protocol Stack From Hell™ Technical Support® and we had the final solution, and if you can read C code, prepared to be horrified:

void Stpd_Lrm_Fnctn(lrm_t *prt,int wat)
{
  char *msg;

  MYSTERIOUS_ENTRY_CODE_WE_CANT_TOUCH();

  msg = (char *)prt + sizeof(lrm_t);
  syslog(LOG_WTF,"error: %s",msg);

  MYSTERIOUS_EXIT_CODE_WE_CANT_TOUCH();
}

For those not fluent in C, let me translate: “you will receive a block of memory called prt, which has a particular layout we laughingly call lrm_t. Ignore the data there, but instead, what you actually want lies just past the block of memory you received, into an area that Standard C calls “undefined behavior.” Abandon all hope ye who program here. And have a nice day.”

Then, I was horrified. Now, I get to see the code from “the other side” and “horrified” does not describe my reaction. “Running away, screaming in sheer madness of having peered deep into the Abyss” would be a bit closer, but still misses the mark. It goes something like this.

typedef struct {
	/* data data data */ /* [1] */
} lrm_t;

typedef struct {
	/* a mass of data */
	arbitrary_size_t reserved[6]; /* [2] */
} msg_t;

void rnd_fnct(int wat)
{
  /* don't worry, these are big enough */
  char inbffr[256],tmpbffr[256],msgbffr[256]; /* [3] */
  msg_t *msg;
  lrm_t lrm,*plrm;

  /* a bunch of code to receive an SS7 message and determine that the */
  /* contents need to be logged, as it indicates an error */

  msg = (msg_t *)inbffr;
  frmt_rnd_msg(msgbffr,msg); /* we know how big the resulting buffer is */
  
  /* okay, now for the real horror show */
  memcpy(&lrm,msg->reserved,sizeof(lrm_t)); /* [4] */

  /* No!  Don't go into the basement! */
  memcpy(tmpbffr,&lrm,sizeof(lrm_t));	    /* [5] */

  /* The call originated from inside the house! */
  strcpy(&tmpbffr[sizeof(lrm_t)],msgbffr);  /* [6] */

  /* Aieeeeeeeeeeeeeeeeeeeeeeeee! stabbity stab stab */
  Stpd_Lrm_Fnctn((lrm_t *)tmpbffr,wat);     /* [7] */
}

And now for the play-by-play commentary on this horror show:

This describes the layout of the memory block we're given in Stpd_Lrm_Fnctn(). It's not terribly big as structured memory goes, maybe around 60 or 70 bytes, but it primarily contains useless information as I found out.
It's a slightly larger block of memory, but notice the last field, reserved. A comment in the code says that this area is for “internal use only” and is around 24 bytes in size.

Keep in mind—this field is only 24 bytes in size. The size of the block of memory we're given in Stpd_Lrm_Fnctn() is around 60 or 70, which is larger than 24. 24 is smaller than 60 and 70. This is important.
“256 bytes should be enough for anyone, right?”

This, on a system with gigabytes of memory.
This copies data out of the reserved field into a block of memory of type lrm_t. Refer back to note 2. Notice how we want to copy 60 or 70 bytes of information, but the field we're copying from is only 24 bytes. This, my friends, is known as “undefined behavior” in C.

Only, this is air quotation marks okay air quotation marks because msg_t is technically the air quotation marks header air quotation marks of a larger message and thus, we can air quotation marks safely air quotation marks copy memory past the end of the header.

My thinking here—the error codes originally fit into the space set aside by reserved but grew over time, found out, but too much code relied on this situation, so they're stuck with it.

Or, you know, rabid howler monkeys on crack.
I just hope that lrm_t doesn't ever exceed 256 bytes in size.
I'm serious. I'm not making this up. The code actually uses strcpy().

strcpy() is bad because there is no checking to see if you have overrun the space set aside to receive the copied string.

The use of this function should cause modern C compilers to bitch mightily and stop compilation right then and there, and send the programmer to jail, do not pass Go, do not collect $200.00. Especially if the programmers are rabid howler monkeys on crack.

This. Is. An. Ex. Function!
And now we call our function. Hope the error message, plus the lrm_t, didn't exceed 256 bytes.

Rabid howler monkeys on crack.

I'm serious.

There aren't enough facepalms to do this code justice.

99 ways to program a hex, Part 23: C89, const correctness, assertive, system calls, full buffering, lookup table

From

Mark Grosberg <XXXXXXXXXXXXXXXXXXXXX>

To

Sean Conner <sean@conman.org>

Subject

Boston: Well, since you're in the land of non-portability …

Date

Sun, 29 Jan 2012 05:55:00
   
static void hexout(char *dest,unsigned long value,size_t size,const int
padding)
{
  assert(dest != NULL);
  assert(size >  0);
  assert((padding >= ' ') && (padding <= '~'));
  
  dest[size] = padding;
  while(size--)
  {
    dest[size] = (char)((value & 0x0F) + '0');
    if (dest[size] > '9') dest[size] += 7;
    value >>= 4;
  }
}
You're also in the land of ASCII specificness. Couldn't you make that:

dest[size] = "0123456789ABCDEF"[value & 0x0f];

And then not be tied to ASCII? You could also then switch out that array pointer if you wanted to get a mix of uppercase, lower case depending on what you need.

-MYG

I initially reject the idea of doing this. My reasoning? The code itself is already non-portable, being restricted to a Posix-like system. So what's one more non-portable item on the list? The sequence if (dest[size] > '9') dest[size] += 7 is around six (for a lot of architectures that aren't RISC based) to twelve bytes (RISC systems) in size, and now you want to add an additional 16 bytes? [He asks, working from a system with a few gigabytes of RAM —Editor] [Shut up! –Sean]. Also, in my nearly 30 years of working with computers, I've yet to come across a non-ASCII based computer system.

Yes, there are a few. Baudot code perhaps being the oldest and perhaps, the oddest one. Then there are the 6-bit character encoding schemes and Radix-50, which pack multiple 6-bit characters per “word” of storage (where a “word” could be 16, 18, 32, 36, 60 or 66 bits in size) and varied from system to system. And let's not forget EBCDIC, one of about six nearly identical, but maddendly different, encoding schemes developed by IBM. All of these were developed for machines in the 60s, but ASCII won out in the end, being the most widely used and at the core of Unicode.

So I asked on a mailing list of classic computer enthusiasts:

From

Sean Conner <spc@conman.org>

To

Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>

Subject

C compilers and non-ASCII systems

Date

Tue, 31 Jan 2012 11:21:02 -0500

A friend recently raised an issue with some code I wrote (a hex dump routine) saying it depended upon ASCII and thus, would break on non-ASCII based systems (and proposed a solution, but that's beside the issue here). I wrote back, saying the code in question was non-portable to begin with (since it depended upon read() and write()—it was targetted at Posix based systems) and besides, I've never encountered a non-ASCII system in the nearly 30 years I've been using computers.

So now I'm wondering—besides Baudot, 6-bit BCD and EBCDIC, is there any other encoding scheme used? And of Baudot, 6-bit BCD and EBCDIC, are there any systems using those encoding schemes AND have a C compiler available?

-spc (Or can I safely assume ASCII and derivatives these days?)

I figure if anyone knew the answer, these people would (many of them not only use computers like the PDP-10, but use them as heaters during the winter months).

The answers were fascinating.

From

"Shoppa, Tim" <XXXXXXXXXXXXXXXXX>

To

Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>

Subject

Re: C compilers and non-ASCII systems

Date

Tue, 31 Jan 2012 13:18:55 -0500

IBM has a very handy page on C compatibility with EBCDIC system services:

http://www-03.ibm.com/systems/z/os/zos/features/unix/bpxa1p03.html

From

"Dave" <XXXXXXXXXXXXXXXXXXXX>

To

Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>

Subject

RE: C compilers and non-ASCII systems

Date

Tue, 31 Jan 2012 19:33:06 -0000

Please consider other character codes. An EBCDIC port of GCC is alive and well on several of the "legacy" operating systems (MVS, VM and Music) that run on the Hercules IBM 360/370/XA/390/z emulator. And whilst zLinux runs in ASCII (or whatever it uses to get more than 256 points in a code page) many zLinux sites also have the zVM hypervisor, which includes an optional EBCDIC C compiler. Having ported the BREXX interpreter to this environment I was stung by the fact that the original author had made assumptions about character ordering that are not true on an EBCDIC platform.

From

Phil Budne <XXXXXXXXXXXXXXXXX>

To

Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>

Subject

Re: C compilers and non-ASCII systems

Date

Tue, 31 Jan 2012 13:00:52 -0500

See “IBM libascii functions for z/OS UNIX System Services”

http://www-03.ibm.com/systems/z/os/zos/features/unix/libascii.html

Overview

The libascii functions are integrated into the base of the Language Environment. They help you port ASCII-based C applications to the EBCDIC-based z/OS UNIX environment.

From

Nemo <XXXXXXXXXXXXXXXX>

To

Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>

Subject

Re: C compilers and non-ASCII systems

Date

Tue, 31 Jan 2012 13:32:06 -0500

z/OS is not only POSIX, it is UNIX (see http://www.opengroup.org/openbrand/register/brand3470.htm).

Oh.

Well then …

I figure I would then try Mark's suggestion (and several other people on the mailing list suggested the same thing) and at least time the change to see if it's a worthwhile change for such odd-looking, but legal, C code.

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: C89, const correctness, assertive, system calls, full buffering */
/*	  lookup table */

#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <assert.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

#define LINESIZE	16

/********************************************************************/

extern const char *sys_errlist[];
extern int         sys_nerr;

static void	do_dump		(const int,const int);
static size_t	dump_line	(char **const,unsigned char *,size_t,const unsigned long);
static void	hexout		(char *const,unsigned long,size_t,const int);
static void	myperror	(const char *const);
static size_t	myread		(const int,char *,size_t);
static void	mywrite		(const int,const char *const,const size_t);

/********************************************************************/

int main(const int argc,const char *const argv[])
{
  if (argc == 1)
    do_dump(STDIN_FILENO,STDOUT_FILENO);
  else
  {
    int i;
    
    for (i = 1 ; i < argc ; i++)
    {
      int fhin;
      
      fhin = open(argv[i],O_RDONLY);
      if (fhin == -1)
      {
        myperror(argv[i]);
        continue;
      }
      
      mywrite(STDOUT_FILENO,"-----",5);
      mywrite(STDOUT_FILENO,argv[i],strlen(argv[i]));
      mywrite(STDOUT_FILENO,"-----\n",6);
      
      do_dump(fhin,STDOUT_FILENO);
      if (close(fhin) < 0)
        myperror(argv[i]);
    }
  }
  
  return EXIT_SUCCESS;
}
      
/************************************************************************/     

static void do_dump(const int fhin,const int fhout)
{
  unsigned char  buffer[4096];
  char           outbuffer[75 * 109];
  char          *pout;
  unsigned long  off;
  size_t         bytes;
  size_t         count;
  
  assert(fhin  >= 0);
  assert(fhout >= 0);

  memset(outbuffer,' ',sizeof(outbuffer));
  off      = 0;
  count    = 0;
  pout     = outbuffer;
  
  while((bytes = myread(fhin,(char *)buffer,sizeof(buffer))) > 0)
  {
    unsigned char *p = buffer;
    
    for (p = buffer ; bytes > 0 ; )
    {
      size_t amount;
      
      amount    = dump_line(&pout,p,bytes,off);
      p        += amount;
      bytes    -= amount;
      off      += amount;
      count++;
      
      if (count == 109)
      {
        mywrite(fhout,outbuffer,(size_t)(pout - outbuffer));
        memset(outbuffer,' ',sizeof(outbuffer));
        count    = 0;
        pout     = outbuffer;
      }      
    }
  }
  
  if ((size_t)(pout - outbuffer) > 0)
    mywrite(fhout,outbuffer,(size_t)(pout - outbuffer));
}

/********************************************************************/

static size_t dump_line(
	char                **const pline,
	unsigned char              *p,
	size_t                      bytes,
	const unsigned long         off
)
{
  char   *line;
  char   *dh;
  char   *da;
  size_t  count;
  
  assert(pline  != NULL);
  assert(*pline != NULL);
  assert(p      != NULL);
  assert(bytes  >  0);
  
  line = *pline;
  
  hexout(line,off,8,':');
  if (bytes > LINESIZE)
    bytes = LINESIZE;
  
  p  += bytes;
  dh  = &line[10 + bytes * 3];
  da  = &line[58 + bytes];
  
  for (count = 0 ; count < bytes ; count++)
  {
    p  --;
    da --;
    dh -= 3;
    
    if ((*p >= ' ') && (*p <= '~'))
      *da = *p;
    else
      *da = '.';
    
    hexout(dh,(unsigned long)*p,2,' ');
  }
  
  line[58 + count] = '\n';
  *pline = &line[59 + count];
  return count;
}

/**********************************************************************/  

static void hexout(char *const dest,unsigned long value,size_t size,const int padding)
{
  assert(dest != NULL);
  assert(size >  0);
  assert((padding >= ' ') && (padding <= '~'));
  
  dest[size] = padding;
  while(size--)
  {
    dest[size] = "0123456789ABCDEF"[value & 0x0f];
    value >>= 4;
  }
}

/************************************************************************/

static void myperror(const char *const s)
{
  int err = errno;
  
  assert(s != NULL);
  
  mywrite(STDERR_FILENO,s,strlen(s));
  mywrite(STDERR_FILENO,": ",2);
  
  if (err > sys_nerr)
    mywrite(STDERR_FILENO,"(unknown)",9);
  else
    mywrite(STDERR_FILENO,sys_errlist[err],strlen(sys_errlist[err]));
  mywrite(STDERR_FILENO,"\n",1);
}

/************************************************************************/

static size_t myread(const int fh,char *buf,size_t size)
{
  size_t amount = 0;
  
  assert(fh   >= 0);
  assert(buf  != NULL);
  assert(size >  0);
  
  while(size > 0)
  {
    ssize_t bytes;
    
    bytes = read(fh,buf,size);
    if (bytes < 0)
    {
      myperror("read()");
      exit(EXIT_FAILURE);
    }
    if (bytes == 0)
      break;
    
    amount += bytes;
    size   -= bytes;
    buf    += bytes;
  }
  
  return amount;
}

/*********************************************************************/  
  
static void mywrite(const int fh,const char *const msg,const size_t size)
{
  assert(fh   >= 0);
  assert(msg  != NULL);
  assert(size >  0);
  
  if (write(fh,msg,size) < (ssize_t)size)
  {
    if (fh != STDERR_FILENO)
      myperror("output");
      
    exit(EXIT_FAILURE);
  }
}

/***********************************************************************/

It can't be that much faster, can it?

[spc]lucy:~/projects/99/src>time ./22 ~/bin/firefox/libxul.so >/dev/null

real    0m0.468s
user    0m0.450s
sys     0m0.018s
[spc]lucy:~/projects/99/src>time ./23 ~/bin/firefox/libxul.so >/dev/null

real    0m0.257s
user    0m0.245s
sys     0m0.012s

Almost twice as fast as what I thought was the fastest version already.

Ouch.

Several people (including Mark) mentioned that on modern CPUs, a branch instruction is like hitting a brick wall.

Yes, it's quite apparent that that is true.

But this does give me an idea for removing one more ~~brick wall~~ branch point …

Tuesday, January 31, 2012

Rabid howler monkeys on crack wrote this code

99 ways to program a hex, Part 23: C89, const correctness, assertive, system calls, full buffering, lookup table

Obligatory Picture

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer