Friday, June 27, 2008
Some musings on bloated software and software performance
This blog post about bloated software got me to thinking about some recreational programming I recently engaged in.
A while ago I broke down and wrote a program to solve Jumbles. It's a very straightforward and simple program too:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <ctype.h> #define WORDS "/usr/share/dict/words" /***********************************************/ static int cmp(const void *l,const void *r) { int cl; int cr; cl = *(const char *)l; cr = *(const char *)r; return cl - cr; } /**********************************************/ int main(int argc,char *argv[]) { FILE *fpin; char *p; char work [BUFSIZ]; size_t len; char buffer[BUFSIZ]; char dwork [BUFSIZ]; size_t dlen; if (argc != 2) { fprintf(stderr,"usage: %s word\n",argv[0]); return(EXIT_FAILURE); } strcpy(work,argv[1]); len = strlen(work); for (p = work ; *p ; p++) *p = toupper(*p); qsort(work,len,1,cmp); fpin = fopen(WORDS,"r"); if (fpin == NULL) { fprintf(stderr,"Huston, we have a problem ... \n"); return(EXIT_FAILURE); } while(fgets(buffer,BUFSIZ,fpin)) { strcpy(dwork,buffer); for (p = dwork ; *p ; p++) { if (*p == '\n') { *p = '\0'; break; } *p = toupper(*p); } dlen = p - dwork; if (dlen != len) continue; qsort(dwork,dlen,1,cmp); if (strcmp(dwork,work) == 0) printf("%*.*s\n",dlen,dlen,buffer); } fclose(fpin); return(EXIT_SUCCESS); }
The whole trick to the program is to take the letters we're trying to
unscramble, say “gerrof”, convert them all to uppercase, “GERROF”, then
sort the letters, “EFGORR”. Then go through a list of words (in this
case, the file /usr/share/dict/words
) and for each word in that
list, go through the same process, then compare the sorted sets of letters.
If they match, that's the answer (or one of several answers).
And as written, it can do about 6 words a second (164 seconds to process 1,000 words, but I'm re-reading the word list file each time). And that's certainly fast enough for most people.
But I wrote another version. Just as simple—or rather, even simpler:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <ctype.h> #include <assert.h> #include "words.h" typedef unsigned long Letters; /*************************************/ Letters has_letters(const char *s) { Letters set = 0; assert(s != NULL); for ( ; *s ; s++) { if ( ((*s < 'A') || (*s > 'z')) || ((*s >'Z') && (*s < 'a')) ) { set |= CHAR_OTHER; } else { set |= (1uL << (toupper(*s) - 'A')); } } return set; } /********************************************/ static int cmp(const void *l,const void *r) { int cl; int cr; cl = *(const char *)l; cr = *(const char *)r; return cl - cr; } /********************************************/ int main(int argc,char *argv[]) { char work [BUFSIZ]; char dwork[BUFSIZ]; char *p; size_t len; size_t i; size_t j; Letters set; if (argc < 2) { fprintf(stderr,"usage: %s word ...\n",argv[0]); return EXIT_FAILURE; } for (i = 1 ; i < argc ; i++) { strcpy(work,argv[i]); len = strlen(work); for (p = work ; *p ; p++) *p = toupper(*p); qsort(work,len,1,cmp); set = has_letters(work); for (j = 0 ; j < cswords ; j++) { if (set != cwords[j].letters) continue; if (len != cwords[j].word.size) continue; strcpy(dwork,cwords[j].word.text); for (p = dwork ; *p ; p++) *p = toupper(*p); qsort(dwork,len,1,cmp); if (strcmp(dwork,work) == 0) printf("%s ",cwords[j].word.text); } printf("\n"); } return EXIT_SUCCESS; }
It works pretty much the same way, except for two major differences:
- it tests potential words against a set of letters. In this case, the set is a 32-bit quantity with a bit set for each letter in the word (an “A” in the word will have bit 0 set, a “C” would have bit 2 set, etc, as you can see from the code above) and the biggie:
- instead of reading in the list of words each time, there's a large static array of all the words (source not shown here, since the source file for that is 62M in size), which also includes the size and letter set of each word.
Yes, a static array of all the words (actually, it's a bit more than that since each entry also contains parts of speech and synonyms, all courtesy of The Moby Project), which “bloats” the executable program by 15M. There is no overhead of processing the data (in tests, it would take about five seconds to read everything in, which doesn't sound like much, but I'm getting to that) and even better—since the entire structure is constant, it can be stored in the read-only section of the program executable, which helps avoid excessive paging (since the operating system can discard any pages not recently used instead of paging out to swap space; if it needs the pages it can page them in directly from the executable file).
The end result of this massive “bloat” is that the second program can process 694 words per second! (or 1.44 seconds to process 1,000 words).
A hundred times faster.
That's quite a speed-up.
It's quite bloated too—an additional 15M worth.
So it's not quite as simple as saying that programs need to be less “bloated”—it also depends on how the program is written. This also shows how certain classes of optimization can block off flexibility. I can easily add or remove words from consideration, since the program reads in an external file. I cannot, however, do that easily for the second program—it would require a recompile.
Even though a program like the lastest version of Microsoft Word is “bloated,” it can also do a ton of stuff that Microsoft Word from the mid 80s couldn't do—like real time spell checking for instance. It could probably be made to run faster, either through dropping certain features (which may not be that bad of an idea) or certainly, profiling the code and speeding up the hot spots (if there are any in a program of that size).
I actually suspect the major reason Micosoft Word is slow is due to paging. Slap enough memory in a Windows box, and turn off paging and it would probably run at a decent speed (once loaded, that is).