The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Tuesday, January 31, 2012

99 ways to program a hex, Part 23: C89, const correctness, assertive, system calls, full buffering, lookup table

From
Mark Grosberg <XXXXXXXXXXXXXXXXXXXXX>
To
Sean Conner <sean@conman.org>
Subject
Boston: Well, since you're in the land of non-portability …
Date
Sun, 29 Jan 2012 05:55:00
   
static void hexout(char *dest,unsigned long value,size_t size,const int
padding)
{
  assert(dest != NULL);
  assert(size >  0);
  assert((padding >= ' ') && (padding <= '~'));
  
  dest[size] = padding;
  while(size--)
  {
    dest[size] = (char)((value & 0x0F) + '0');
    if (dest[size] > '9') dest[size] += 7;
    value >>= 4;
  }
}

You're also in the land of ASCII specificness. Couldn't you make that:

dest[size] = "0123456789ABCDEF"[value & 0x0f];

And then not be tied to ASCII? You could also then switch out that array pointer if you wanted to get a mix of uppercase, lower case depending on what you need.

-MYG

I initially reject the idea of doing this. My reasoning? The code itself is already non-portable, being restricted to a Posix-like system. So what's one more non-portable item on the list? The sequence if (dest[size] > '9') dest[size] += 7 is around six (for a lot of architectures that aren't RISC based) to twelve bytes (RISC systems) in size, and now you want to add an additional 16 bytes? [He asks, working from a system with a few gigabytes of RAM —Editor] [Shut up! –Sean]. Also, in my nearly 30 years of working with computers, I've yet to come across a non-ASCII based computer system.

Yes, there are a few. Baudot code perhaps being the oldest and perhaps, the oddest one. Then there are the 6-bit character encoding schemes and Radix-50, which pack multiple 6-bit characters per “word” of storage (where a “word” could be 16, 18, 32, 36, 60 or 66 bits in size) and varied from system to system. And let's not forget EBCDIC, one of about six nearly identical, but maddendly different, encoding schemes developed by IBM. All of these were developed for machines in the 60s, but ASCII won out in the end, being the most widely used and at the core of Unicode.

So I asked on a mailing list of classic computer enthusiasts:

From
Sean Conner <spc@conman.org>
To
Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
Subject
C compilers and non-ASCII systems
Date
Tue, 31 Jan 2012 11:21:02 -0500

A friend recently raised an issue with some code I wrote (a hex dump routine) saying it depended upon ASCII and thus, would break on non-ASCII based systems (and proposed a solution, but that's beside the issue here). I wrote back, saying the code in question was non-portable to begin with (since it depended upon read() and write()—it was targetted at Posix based systems) and besides, I've never encountered a non-ASCII system in the nearly 30 years I've been using computers.

So now I'm wondering—besides Baudot, 6-bit BCD and EBCDIC, is there any other encoding scheme used? And of Baudot, 6-bit BCD and EBCDIC, are there any systems using those encoding schemes AND have a C compiler available?

-spc (Or can I safely assume ASCII and derivatives these days?)

I figure if anyone knew the answer, these people would (many of them not only use computers like the PDP-10, but use them as heaters during the winter months).

The answers were fascinating.

From
"Shoppa, Tim" <XXXXXXXXXXXXXXXXX>
To
Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
Subject
Re: C compilers and non-ASCII systems
Date
Tue, 31 Jan 2012 13:18:55 -0500

IBM has a very handy page on C compatibility with EBCDIC system services:

http://www-03.ibm.com/systems/z/os/zos/features/unix/bpxa1p03.html

From
"Dave" <XXXXXXXXXXXXXXXXXXXX>
To
Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
Subject
RE: C compilers and non-ASCII systems
Date
Tue, 31 Jan 2012 19:33:06 -0000

Please consider other character codes. An EBCDIC port of GCC is alive and well on several of the "legacy" operating systems (MVS, VM and Music) that run on the Hercules IBM 360/370/XA/390/z emulator. And whilst zLinux runs in ASCII (or whatever it uses to get more than 256 points in a code page) many zLinux sites also have the zVM hypervisor, which includes an optional EBCDIC C compiler. Having ported the BREXX interpreter to this environment I was stung by the fact that the original author had made assumptions about character ordering that are not true on an EBCDIC platform.

From
Phil Budne <XXXXXXXXXXXXXXXXX>
To
Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
Subject
Re: C compilers and non-ASCII systems
Date
Tue, 31 Jan 2012 13:00:52 -0500

See “IBM libascii functions for z/OS UNIX System Services”

http://www-03.ibm.com/systems/z/os/zos/features/unix/libascii.html

Overview
The libascii functions are integrated into the base of the Language Environment. They help you port ASCII-based C applications to the EBCDIC-based z/OS UNIX environment.
From
Nemo <XXXXXXXXXXXXXXXX>
To
Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
Subject
Re: C compilers and non-ASCII systems
Date
Tue, 31 Jan 2012 13:32:06 -0500

z/OS is not only POSIX, it is UNIX (see http://www.opengroup.org/openbrand/register/brand3470.htm).

Oh.

Well then …

I figure I would then try Mark's suggestion (and several other people on the mailing list suggested the same thing) and at least time the change to see if it's a worthwhile change for such odd-looking, but legal, C code.

/*************************************************************************
*
* Copyright 2012 by Sean Conner.  All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/

/* Style: C89, const correctness, assertive, system calls, full buffering */
/*	  lookup table */

#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <assert.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

#define LINESIZE	16

/********************************************************************/

extern const char *sys_errlist[];
extern int         sys_nerr;

static void	do_dump		(const int,const int);
static size_t	dump_line	(char **const,unsigned char *,size_t,const unsigned long);
static void	hexout		(char *const,unsigned long,size_t,const int);
static void	myperror	(const char *const);
static size_t	myread		(const int,char *,size_t);
static void	mywrite		(const int,const char *const,const size_t);

/********************************************************************/

int main(const int argc,const char *const argv[])
{
  if (argc == 1)
    do_dump(STDIN_FILENO,STDOUT_FILENO);
  else
  {
    int i;
    
    for (i = 1 ; i < argc ; i++)
    {
      int fhin;
      
      fhin = open(argv[i],O_RDONLY);
      if (fhin == -1)
      {
        myperror(argv[i]);
        continue;
      }
      
      mywrite(STDOUT_FILENO,"-----",5);
      mywrite(STDOUT_FILENO,argv[i],strlen(argv[i]));
      mywrite(STDOUT_FILENO,"-----\n",6);
      
      do_dump(fhin,STDOUT_FILENO);
      if (close(fhin) < 0)
        myperror(argv[i]);
    }
  }
  
  return EXIT_SUCCESS;
}
      
/************************************************************************/     

static void do_dump(const int fhin,const int fhout)
{
  unsigned char  buffer[4096];
  char           outbuffer[75 * 109];
  char          *pout;
  unsigned long  off;
  size_t         bytes;
  size_t         count;
  
  assert(fhin  >= 0);
  assert(fhout >= 0);

  memset(outbuffer,' ',sizeof(outbuffer));
  off      = 0;
  count    = 0;
  pout     = outbuffer;
  
  while((bytes = myread(fhin,(char *)buffer,sizeof(buffer))) > 0)
  {
    unsigned char *p = buffer;
    
    for (p = buffer ; bytes > 0 ; )
    {
      size_t amount;
      
      amount    = dump_line(&pout,p,bytes,off);
      p        += amount;
      bytes    -= amount;
      off      += amount;
      count++;
      
      if (count == 109)
      {
        mywrite(fhout,outbuffer,(size_t)(pout - outbuffer));
        memset(outbuffer,' ',sizeof(outbuffer));
        count    = 0;
        pout     = outbuffer;
      }      
    }
  }
  
  if ((size_t)(pout - outbuffer) > 0)
    mywrite(fhout,outbuffer,(size_t)(pout - outbuffer));
}

/********************************************************************/

static size_t dump_line(
	char                **const pline,
	unsigned char              *p,
	size_t                      bytes,
	const unsigned long         off
)
{
  char   *line;
  char   *dh;
  char   *da;
  size_t  count;
  
  assert(pline  != NULL);
  assert(*pline != NULL);
  assert(p      != NULL);
  assert(bytes  >  0);
  
  line = *pline;
  
  hexout(line,off,8,':');
  if (bytes > LINESIZE)
    bytes = LINESIZE;
  
  p  += bytes;
  dh  = &line[10 + bytes * 3];
  da  = &line[58 + bytes];
  
  for (count = 0 ; count < bytes ; count++)
  {
    p  --;
    da --;
    dh -= 3;
    
    if ((*p >= ' ') && (*p <= '~'))
      *da = *p;
    else
      *da = '.';
    
    hexout(dh,(unsigned long)*p,2,' ');
  }
  
  line[58 + count] = '\n';
  *pline = &line[59 + count];
  return count;
}

/**********************************************************************/  

static void hexout(char *const dest,unsigned long value,size_t size,const int padding)
{
  assert(dest != NULL);
  assert(size >  0);
  assert((padding >= ' ') && (padding <= '~'));
  
  dest[size] = padding;
  while(size--)
  {
    dest[size] = "0123456789ABCDEF"[value & 0x0f];
    value >>= 4;
  }
}

/************************************************************************/

static void myperror(const char *const s)
{
  int err = errno;
  
  assert(s != NULL);
  
  mywrite(STDERR_FILENO,s,strlen(s));
  mywrite(STDERR_FILENO,": ",2);
  
  if (err > sys_nerr)
    mywrite(STDERR_FILENO,"(unknown)",9);
  else
    mywrite(STDERR_FILENO,sys_errlist[err],strlen(sys_errlist[err]));
  mywrite(STDERR_FILENO,"\n",1);
}

/************************************************************************/

static size_t myread(const int fh,char *buf,size_t size)
{
  size_t amount = 0;
  
  assert(fh   >= 0);
  assert(buf  != NULL);
  assert(size >  0);
  
  while(size > 0)
  {
    ssize_t bytes;
    
    bytes = read(fh,buf,size);
    if (bytes < 0)
    {
      myperror("read()");
      exit(EXIT_FAILURE);
    }
    if (bytes == 0)
      break;
    
    amount += bytes;
    size   -= bytes;
    buf    += bytes;
  }
  
  return amount;
}

/*********************************************************************/  
  
static void mywrite(const int fh,const char *const msg,const size_t size)
{
  assert(fh   >= 0);
  assert(msg  != NULL);
  assert(size >  0);
  
  if (write(fh,msg,size) < (ssize_t)size)
  {
    if (fh != STDERR_FILENO)
      myperror("output");
      
    exit(EXIT_FAILURE);
  }
}

/***********************************************************************/

It can't be that much faster, can it?

[spc]lucy:~/projects/99/src>time ./22 ~/bin/firefox/libxul.so >/dev/null

real    0m0.468s
user    0m0.450s
sys     0m0.018s
[spc]lucy:~/projects/99/src>time ./23 ~/bin/firefox/libxul.so >/dev/null

real    0m0.257s
user    0m0.245s
sys     0m0.012s

Almost twice as fast as what I thought was the fastest version already.

Ouch.

Several people (including Mark) mentioned that on modern CPUs, a branch instruction is like hitting a brick wall.

Yes, it's quite apparent that that is true.

But this does give me an idea for removing one more brick wall branch point …

Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.