The Boston Diaries

Friday, June 27, 2008

Some musings on bloated software and software performance

This blog post about bloated software got me to thinking about some recreational programming I recently engaged in.

A while ago I broke down and wrote a program to solve Jumbles. It's a very straightforward and simple program too:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
  
#define WORDS   "/usr/share/dict/words"

/***********************************************/

static int cmp(const void *l,const void *r)
{ 
  int cl;
  int cr;
  
  cl = *(const char *)l;
  cr = *(const char *)r;
  
  return cl - cr;
} 

/**********************************************/
    
int main(int argc,char *argv[])
{
  FILE   *fpin;
  char   *p;
  char    work  [BUFSIZ];
  size_t  len;
  char    buffer[BUFSIZ];
  char    dwork [BUFSIZ];
  size_t  dlen;
  
  if (argc != 2)
  { 
    fprintf(stderr,"usage: %s word\n",argv[0]);
    return(EXIT_FAILURE);
  }
  
  strcpy(work,argv[1]);
  len  = strlen(work);
  
  for (p = work ; *p ; p++)
    *p = toupper(*p);
  
  qsort(work,len,1,cmp);
  fpin = fopen(WORDS,"r");
  if (fpin == NULL)
  {
    fprintf(stderr,"Huston, we have a problem ... \n");
    return(EXIT_FAILURE);
  }
 
  while(fgets(buffer,BUFSIZ,fpin))
  {
    strcpy(dwork,buffer);
 
    for (p = dwork ; *p ; p++)
    {
      if (*p == '\n')
      {
        *p = '\0';
        break;
      }
      *p = toupper(*p);
    }

    dlen = p - dwork;
    if (dlen != len) continue;
    qsort(dwork,dlen,1,cmp);
    if (strcmp(dwork,work) == 0)
      printf("%*.*s\n",dlen,dlen,buffer);
  }

  fclose(fpin);
  return(EXIT_SUCCESS);
}

The whole trick to the program is to take the letters we're trying to unscramble, say “gerrof”, convert them all to uppercase, “GERROF”, then sort the letters, “EFGORR”. Then go through a list of words (in this case, the file /usr/share/dict/words) and for each word in that list, go through the same process, then compare the sorted sets of letters. If they match, that's the answer (or one of several answers).

And as written, it can do about 6 words a second (164 seconds to process 1,000 words, but I'm re-reading the word list file each time). And that's certainly fast enough for most people.

But I wrote another version. Just as simple—or rather, even simpler:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>

#include "words.h"

typedef unsigned long Letters;

/*************************************/

Letters has_letters(const char *s)
{
  Letters set = 0;

  assert(s != NULL);

  for ( ; *s ; s++)
  {
    if (
         ((*s < 'A') || (*s > 'z'))
         || ((*s >'Z') && (*s < 'a'))
       )
    {
      set |= CHAR_OTHER;
    }
    else
    {
      set |= (1uL << (toupper(*s) - 'A'));
    }
  }

  return set;
}

/********************************************/

static int cmp(const void *l,const void *r)
{
  int cl;
  int cr;

  cl = *(const char *)l;
  cr = *(const char *)r;

  return cl - cr;
}

/********************************************/

int main(int argc,char *argv[])
{
  char    work [BUFSIZ];
  char    dwork[BUFSIZ];
  char   *p;
  size_t  len;
  size_t  i;
  size_t  j;
  Letters set;

  if (argc < 2)
  {
    fprintf(stderr,"usage: %s word ...\n",argv[0]);
    return EXIT_FAILURE;
  }

  for (i = 1 ; i < argc ; i++)
  {
    strcpy(work,argv[i]);
    len = strlen(work);

    for (p = work ; *p ; p++)
      *p = toupper(*p);

    qsort(work,len,1,cmp);
    set = has_letters(work);

    for (j = 0 ; j < cswords ; j++)
    {
      if (set != cwords[j].letters) continue;
      if (len != cwords[j].word.size) continue;
      strcpy(dwork,cwords[j].word.text);
      for (p = dwork ; *p ; p++)
        *p = toupper(*p);
      qsort(dwork,len,1,cmp);
      if (strcmp(dwork,work) == 0)
        printf("%s ",cwords[j].word.text);
    }
    printf("\n");
  }

  return EXIT_SUCCESS;
}

It works pretty much the same way, except for two major differences:

it tests potential words against a set of letters. In this case, the set is a 32-bit quantity with a bit set for each letter in the word (an “A” in the word will have bit 0 set, a “C” would have bit 2 set, etc, as you can see from the code above) and the biggie:
instead of reading in the list of words each time, there's a large static array of all the words (source not shown here, since the source file for that is 62M in size), which also includes the size and letter set of each word.

Yes, a static array of all the words (actually, it's a bit more than that since each entry also contains parts of speech and synonyms, all courtesy of The Moby Project), which “bloats” the executable program by 15M. There is no overhead of processing the data (in tests, it would take about five seconds to read everything in, which doesn't sound like much, but I'm getting to that) and even better—since the entire structure is constant, it can be stored in the read-only section of the program executable, which helps avoid excessive paging (since the operating system can discard any pages not recently used instead of paging out to swap space; if it needs the pages it can page them in directly from the executable file).

The end result of this massive “bloat” is that the second program can process 694 words per second! (or 1.44 seconds to process 1,000 words).

A hundred times faster.

That's quite a speed-up.

It's quite bloated too—an additional 15M worth.

So it's not quite as simple as saying that programs need to be less “bloated”—it also depends on how the program is written. This also shows how certain classes of optimization can block off flexibility. I can easily add or remove words from consideration, since the program reads in an external file. I cannot, however, do that easily for the second program—it would require a recompile.

Even though a program like the lastest version of Microsoft Word is “bloated,” it can also do a ton of stuff that Microsoft Word from the mid 80s couldn't do—like real time spell checking for instance. It could probably be made to run faster, either through dropping certain features (which may not be that bad of an idea) or certainly, profiling the code and speeding up the hot spots (if there are any in a program of that size).

I actually suspect the major reason Micosoft Word is slow is due to paging. Slap enough memory in a Windows box, and turn off paging and it would probably run at a decent speed (once loaded, that is).

A sea of memory

So what exactly prompted me to create a 15M linkable list of words?

The potential of memristors.

If memristors pan out (and I hope they do) then that means we'll get general purpose fully solid state computers without any disks whatsoever. With densities greater than harddrives and speeds that rival conventional RAM, why even bother with a file system anymore? Why not just have everything mapped into memory?

It's not like this is a new idea either.

Back in college the workstation I used had an incredible 1G harddrive. Fifteen years later it's common for home computers to have more than 1G of RAM and it's weird to think that I could basically store everything I had on that machine totally in memory and still have memory left over to run programs.

PDAs also have no concept of disks or files. Everything in a PDA is just there in memory.

Such thoughts have been in my mind recently, and I figured I might as well play around with the notion that everything is just there, in memory, and how would that affect programming.

Heck, in reading over the novel ideas of Multics I'm beginning to think that Multics was way ahead of its time.

Friday, June 27, 2008

Some musings on bloated software and software performance

A sea of memory

Obligatory Picture

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous