Tuesday, January 31, 2012
Rabid howler monkeys on crack wrote this code
Okay, yes, there are issues with the
code to The Protocol Stack From Hell™. There's the vowel impaired
names—oh, sorry, the vwl_imprd_nms
, the usual ignoring return
codes and tons of global variables—sorry, glblvrbls
littering
the code. And there's stuff like:
foo_t *p = NULL; /* lots of code not touching p at all */ if (p) { /* lots of code that will never be executed because */ /* p is always, *always* NULL at this point */ } else { /* this code will always *always* be executed */ /* p is never touched otherwise */ } /* p is still never used */
Yes. At one point p
was probaby used, then a code change sometime during
the Clinton
Administration (late first term most likely) removed the need for
p
but later code still checked it, so in order to keep the code
from crashing (during the last year of the Clinton Administration, most
likely) the “easiest fix that would work with minimal code changes because
we want to avoid a five day regression test” is to just NULL
out the variable where declared and call it a day.
Odder yet is the code that generates a string, checks to see if the generated string ends with two newline characters and then adds one or two newline characters if required (and yes, it checks for the first newline character, then the second) and further down in the code, it checks to see if the line has two newline characters and carefully removes them, one at a time.
Yes. The code adds two characters, only to remove them later on.
Again, I can see the requirements late during the Reagan Administration bumping up against the requirements during the early Bush 43 Adminstration and again, the easiest way to handle this is a local change that distrubs as little code as possible.
Although, there is one bit that does smack of rabid howler monkeys on crack taking a pass at the code, which I briefly mention in passing. It's basically the Poster Child™ for why certain C programmers should be taken out back behind the shed and disembowled with a grapefruit spoon.
Back then, I was tasked with modifying some code to log the Protocol
Stack From Hell™ errors via syslog()
, and all I had to
work with was a C source file:
/* tons o' broilerplate text whereby we pledge our first born to feed the * lawyers of The Protocol Stack From Hell™ */ void Stpd_Init_Fnctn(void) { MYSTERIOUS_ENTRY_CODE_WE_CANT_TOUCH(); /* modifications here */ MYSTERIOUS_EXIT_CODE_WE_CANT_TOUCH(); } void Stpd_Lrm_Fnctn(lrm_t *ptr,int wat) { MYSTERIOUS_ENTRY_CODE_WE_CANT_TOUCH(); /* modifications here */ MYSTERIOUS_EXIT_CODE_WE_CANT_TOUCH(); } /* Muahahahahahahahahahahahaha! */ /* [S/X thunder ] */
(No, seriously, each function starts and ends with
MYSTERIOUS_something_CODE_WE_CANT_TOUCH()
) and an object file,
which the C code is linked against to produce the final program.
Okay, nothing that out of the ordinary. Only we weren't getting
the proper error messages from the lrm_t
… thingy … we were
given. Some back and forth with The Protocol Stack From Hell™
Technical Support® and we had the final solution, and if you can read C
code, prepared to be horrified:
void Stpd_Lrm_Fnctn(lrm_t *prt,int wat) { char *msg; MYSTERIOUS_ENTRY_CODE_WE_CANT_TOUCH(); msg = (char *)prt + sizeof(lrm_t); syslog(LOG_WTF,"error: %s",msg); MYSTERIOUS_EXIT_CODE_WE_CANT_TOUCH(); }
For those not fluent in C, let me translate: “you will receive a block
of memory called prt
, which has a particular layout we
laughingly call lrm_t
. Ignore the data there, but instead,
what you actually want lies just past the block of memory you received, into
an area that Standard C calls “undefined behavior.” Abandon all hope ye who
program here. And have a nice day.”
Then, I was horrified. Now, I get to see the code from “the other side” and “horrified” does not describe my reaction. “Running away, screaming in sheer madness of having peered deep into the Abyss” would be a bit closer, but still misses the mark. It goes something like this.
typedef struct { /* data data data */ /* [1] */ } lrm_t; typedef struct { /* a mass of data */ arbitrary_size_t reserved[6]; /* [2] */ } msg_t; void rnd_fnct(int wat) { /* don't worry, these are big enough */ char inbffr[256],tmpbffr[256],msgbffr[256]; /* [3] */ msg_t *msg; lrm_t lrm,*plrm; /* a bunch of code to receive an SS7 message and determine that the */ /* contents need to be logged, as it indicates an error */ msg = (msg_t *)inbffr; frmt_rnd_msg(msgbffr,msg); /* we know how big the resulting buffer is */ /* okay, now for the real horror show */ memcpy(&lrm,msg->reserved,sizeof(lrm_t)); /* [4] */ /* No! Don't go into the basement! */ memcpy(tmpbffr,&lrm,sizeof(lrm_t)); /* [5] */ /* The call originated from inside the house! */ strcpy(&tmpbffr[sizeof(lrm_t)],msgbffr); /* [6] */ /* Aieeeeeeeeeeeeeeeeeeeeeeeee! stabbity stab stab */ Stpd_Lrm_Fnctn((lrm_t *)tmpbffr,wat); /* [7] */ }
And now for the play-by-play commentary on this horror show:
This describes the layout of the memory block we're given in
Stpd_Lrm_Fnctn()
. It's not terribly big as structured memory goes, maybe around 60 or 70 bytes, but it primarily contains useless information as I found out.It's a slightly larger block of memory, but notice the last field,
reserved
. A comment in the code says that this area is for “internal use only” and is around 24 bytes in size.Keep in mind—this field is only 24 bytes in size. The size of the block of memory we're given in
Stpd_Lrm_Fnctn()
is around 60 or 70, which is larger than 24. 24 is smaller than 60 and 70. This is important.“256 bytes should be enough for anyone, right?”
This, on a system with gigabytes of memory.
This copies data out of the
reserved
field into a block of memory of typelrm_t
. Refer back to note 2. Notice how we want to copy 60 or 70 bytes of information, but the field we're copying from is only 24 bytes. This, my friends, is known as “undefined behavior” in C.Only, this is air quotation marks okay air quotation marks because
msg_t
is technically the air quotation marks header air quotation marks of a larger message and thus, we can air quotation marks safely air quotation marks copy memory past the end of the header.My thinking here—the error codes originally fit into the space set aside by
reserved
but grew over time, found out, but too much code relied on this situation, so they're stuck with it.Or, you know, rabid howler monkeys on crack.
I just hope that
lrm_t
doesn't ever exceed 256 bytes in size.I'm serious. I'm not making this up. The code actually uses
strcpy()
.strcpy()
is bad because there is no checking to see if you have overrun the space set aside to receive the copied string.The use of this function should cause modern C compilers to bitch mightily and stop compilation right then and there, and send the programmer to jail, do not pass Go, do not collect $200.00. Especially if the programmers are rabid howler monkeys on crack.
This. Is. An. Ex. Function!
And now we call our function. Hope the error message, plus the
lrm_t
, didn't exceed 256 bytes.
Rabid howler monkeys on crack.
I'm serious.
There aren't enough facepalms to do this code justice.
99 ways to program a hex, Part 23: C89, const correctness, assertive, system calls, full buffering, lookup table
- From
- Mark Grosberg <XXXXXXXXXXXXXXXXXXXXX>
- To
- Sean Conner <sean@conman.org>
- Subject
- Boston: Well, since you're in the land of non-portability …
- Date
- Sun, 29 Jan 2012 05:55:00
static void hexout(char *dest,unsigned long value,size_t size,const int padding) { assert(dest != NULL); assert(size > 0); assert((padding >= ' ') && (padding <= '~')); dest[size] = padding; while(size--) { dest[size] = (char)((value & 0x0F) + '0'); if (dest[size] > '9') dest[size] += 7; value >>= 4; } }You're also in the land of ASCII specificness. Couldn't you make that:
dest[size] = "0123456789ABCDEF"[value & 0x0f];
And then not be tied to ASCII? You could also then switch out that array pointer if you wanted to get a mix of uppercase, lower case depending on what you need.
-MYG
I initially reject the idea of doing this. My reasoning? The code
itself is already non-portable, being restricted to a Posix-like system. So what's
one more non-portable item on the list? The sequence if (dest[size] >
'9') dest[size] += 7
is around six (for a lot of architectures that
aren't RISC
based) to twelve bytes (RISC systems) in size, and now you want to add an
additional 16 bytes? [He asks, working from a system
with a few gigabytes of RAM
—Editor] [Shut up! –Sean]. Also, in my nearly 30 years of
working with computers, I've yet to come across a non-ASCII based computer
system.
Yes, there are a few. Baudot code perhaps being the oldest and perhaps, the oddest one. Then there are the 6-bit character encoding schemes and Radix-50, which pack multiple 6-bit characters per “word” of storage (where a “word” could be 16, 18, 32, 36, 60 or 66 bits in size) and varied from system to system. And let's not forget EBCDIC, one of about six nearly identical, but maddendly different, encoding schemes developed by IBM. All of these were developed for machines in the 60s, but ASCII won out in the end, being the most widely used and at the core of Unicode.
So I asked on a mailing list of classic computer enthusiasts:
- From
- Sean Conner <spc@conman.org>
- To
- Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
- Subject
- C compilers and non-ASCII systems
- Date
- Tue, 31 Jan 2012 11:21:02 -0500
A friend recently raised an issue with some code I wrote (a hex dump routine) saying it depended upon ASCII and thus, would break on non-ASCII based systems (and proposed a solution, but that's beside the issue here). I wrote back, saying the code in question was non-portable to begin with (since it depended upon read() and write()—it was targetted at Posix based systems) and besides, I've never encountered a non-ASCII system in the nearly 30 years I've been using computers.
So now I'm wondering—besides Baudot, 6-bit BCD and EBCDIC, is there any other encoding scheme used? And of Baudot, 6-bit BCD and EBCDIC, are there any systems using those encoding schemes AND have a C compiler available?
-spc (Or can I safely assume ASCII and derivatives these days?)
I figure if anyone knew the answer, these people would (many of them not only use computers like the PDP-10, but use them as heaters during the winter months).
The answers were fascinating.
- From
- "Shoppa, Tim" <XXXXXXXXXXXXXXXXX>
- To
- Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
- Subject
- Re: C compilers and non-ASCII systems
- Date
- Tue, 31 Jan 2012 13:18:55 -0500
IBM has a very handy page on C compatibility with EBCDIC system services:
http://www-03.ibm.com/systems/z/os/zos/features/unix/bpxa1p03.html
- From
- "Dave" <XXXXXXXXXXXXXXXXXXXX>
- To
- Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
- Subject
- RE: C compilers and non-ASCII systems
- Date
- Tue, 31 Jan 2012 19:33:06 -0000
Please consider other character codes. An EBCDIC port of GCC is alive and well on several of the "legacy" operating systems (MVS, VM and Music) that run on the Hercules IBM 360/370/XA/390/z emulator. And whilst zLinux runs in ASCII (or whatever it uses to get more than 256 points in a code page) many zLinux sites also have the zVM hypervisor, which includes an optional EBCDIC C compiler. Having ported the BREXX interpreter to this environment I was stung by the fact that the original author had made assumptions about character ordering that are not true on an EBCDIC platform.
- From
- Phil Budne <XXXXXXXXXXXXXXXXX>
- To
- Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
- Subject
- Re: C compilers and non-ASCII systems
- Date
- Tue, 31 Jan 2012 13:00:52 -0500
See “IBM libascii functions for z/OS UNIX System Services”
http://www-03.ibm.com/systems/z/os/zos/features/unix/libascii.html
- Overview
- The libascii functions are integrated into the base of the Language Environment. They help you port ASCII-based C applications to the EBCDIC-based z/OS UNIX environment.
- From
- Nemo <XXXXXXXXXXXXXXXX>
- To
- Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
- Subject
- Re: C compilers and non-ASCII systems
- Date
- Tue, 31 Jan 2012 13:32:06 -0500
z/OS is not only POSIX, it is UNIX (see
http://www.opengroup.org/openbrand/register/brand3470.htm
).
Oh.
Well then …
I figure I would then try Mark's suggestion (and several other people on the mailing list suggested the same thing) and at least time the change to see if it's a worthwhile change for such odd-looking, but legal, C code.
/************************************************************************* * * Copyright 2012 by Sean Conner. All Rights Reserved. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version 2 * of the License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * * Comments, questions and criticisms can be sent to: sean@conman.org * *************************************************************************/ /* Style: C89, const correctness, assertive, system calls, full buffering */ /* lookup table */ #include <stdlib.h> #include <string.h> #include <errno.h> #include <assert.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #define LINESIZE 16 /********************************************************************/ extern const char *sys_errlist[]; extern int sys_nerr; static void do_dump (const int,const int); static size_t dump_line (char **const,unsigned char *,size_t,const unsigned long); static void hexout (char *const,unsigned long,size_t,const int); static void myperror (const char *const); static size_t myread (const int,char *,size_t); static void mywrite (const int,const char *const,const size_t); /********************************************************************/ int main(const int argc,const char *const argv[]) { if (argc == 1) do_dump(STDIN_FILENO,STDOUT_FILENO); else { int i; for (i = 1 ; i < argc ; i++) { int fhin; fhin = open(argv[i],O_RDONLY); if (fhin == -1) { myperror(argv[i]); continue; } mywrite(STDOUT_FILENO,"-----",5); mywrite(STDOUT_FILENO,argv[i],strlen(argv[i])); mywrite(STDOUT_FILENO,"-----\n",6); do_dump(fhin,STDOUT_FILENO); if (close(fhin) < 0) myperror(argv[i]); } } return EXIT_SUCCESS; } /************************************************************************/ static void do_dump(const int fhin,const int fhout) { unsigned char buffer[4096]; char outbuffer[75 * 109]; char *pout; unsigned long off; size_t bytes; size_t count; assert(fhin >= 0); assert(fhout >= 0); memset(outbuffer,' ',sizeof(outbuffer)); off = 0; count = 0; pout = outbuffer; while((bytes = myread(fhin,(char *)buffer,sizeof(buffer))) > 0) { unsigned char *p = buffer; for (p = buffer ; bytes > 0 ; ) { size_t amount; amount = dump_line(&pout,p,bytes,off); p += amount; bytes -= amount; off += amount; count++; if (count == 109) { mywrite(fhout,outbuffer,(size_t)(pout - outbuffer)); memset(outbuffer,' ',sizeof(outbuffer)); count = 0; pout = outbuffer; } } } if ((size_t)(pout - outbuffer) > 0) mywrite(fhout,outbuffer,(size_t)(pout - outbuffer)); } /********************************************************************/ static size_t dump_line( char **const pline, unsigned char *p, size_t bytes, const unsigned long off ) { char *line; char *dh; char *da; size_t count; assert(pline != NULL); assert(*pline != NULL); assert(p != NULL); assert(bytes > 0); line = *pline; hexout(line,off,8,':'); if (bytes > LINESIZE) bytes = LINESIZE; p += bytes; dh = &line[10 + bytes * 3]; da = &line[58 + bytes]; for (count = 0 ; count < bytes ; count++) { p --; da --; dh -= 3; if ((*p >= ' ') && (*p <= '~')) *da = *p; else *da = '.'; hexout(dh,(unsigned long)*p,2,' '); } line[58 + count] = '\n'; *pline = &line[59 + count]; return count; } /**********************************************************************/ static void hexout(char *const dest,unsigned long value,size_t size,const int padding) { assert(dest != NULL); assert(size > 0); assert((padding >= ' ') && (padding <= '~')); dest[size] = padding; while(size--) { dest[size] = "0123456789ABCDEF"[value & 0x0f]; value >>= 4; } } /************************************************************************/ static void myperror(const char *const s) { int err = errno; assert(s != NULL); mywrite(STDERR_FILENO,s,strlen(s)); mywrite(STDERR_FILENO,": ",2); if (err > sys_nerr) mywrite(STDERR_FILENO,"(unknown)",9); else mywrite(STDERR_FILENO,sys_errlist[err],strlen(sys_errlist[err])); mywrite(STDERR_FILENO,"\n",1); } /************************************************************************/ static size_t myread(const int fh,char *buf,size_t size) { size_t amount = 0; assert(fh >= 0); assert(buf != NULL); assert(size > 0); while(size > 0) { ssize_t bytes; bytes = read(fh,buf,size); if (bytes < 0) { myperror("read()"); exit(EXIT_FAILURE); } if (bytes == 0) break; amount += bytes; size -= bytes; buf += bytes; } return amount; } /*********************************************************************/ static void mywrite(const int fh,const char *const msg,const size_t size) { assert(fh >= 0); assert(msg != NULL); assert(size > 0); if (write(fh,msg,size) < (ssize_t)size) { if (fh != STDERR_FILENO) myperror("output"); exit(EXIT_FAILURE); } } /***********************************************************************/
It can't be that much faster, can it?
[spc]lucy:~/projects/99/src>time ./22 ~/bin/firefox/libxul.so >/dev/null real 0m0.468s user 0m0.450s sys 0m0.018s [spc]lucy:~/projects/99/src>time ./23 ~/bin/firefox/libxul.so >/dev/null real 0m0.257s user 0m0.245s sys 0m0.012s
Almost twice as fast as what I thought was the fastest version already.
Ouch.
Several people (including Mark) mentioned that on modern CPUs, a branch instruction is like hitting a brick wall.
Yes, it's quite apparent that that is true.
But this does give me an idea for removing one more brick wall
branch point …