The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Saturday, September 08, 2007

Notes on an ideal integrated development system

I'm not a fan of IDEs. I grew up with the “edit-compile-run” cyle of development, and while I didn't always have a choice in the “compile” portion of things, I did in the “edit” portion, and over time became very picky about which editor I use. Because of that, whenever I did try an IDE, I invariably found the “edit” portion to be very painful, stuck in an editor that I wasn't used to; being forced to use an unfamiliar editor resulted in a vast loss of productivity and thus, I've never liked IDEs. So I stuck with the “edit-compile-run” cycle.

But the recent bout of programming I've done has made me wish for something better than the “edit-compile-run” cycle. And while IDEs have probably evolved since I last tried them in the late 80s, I don't think they've evolved enough to suit me.

What I'm about to describe is defintely “pie-in-the-sky” stuff. I'm not saying that IDEs must be this way—I'm just saying that this is what I would like in an IDE. Who knows? Maybe this won't work. Maybe it's unworkable. But I wouldn't mind seeing these features (at long as the editing could be configured to my liking).

A database I used in the early 80s that ran on a twin floppy PC. Written by Brian Berkowitz and Richard Ilson

Wonderful features were:

Cornerstone

The one feature of Cornerstone that still strikes me as innovative is the separation of variables (or in this case, fields and tables) from their name. One could change the name of a variable without having to edit every other occurrence of that name. That's a very powerful feature, but to implement it in an IDE, that IDE would have to have intimate knowledge of the computer language being used.

A few years ago, I cleaned up the code in mod_blog. I had a bunch of global variables used throughout the codebase, all starting with “g_” (such as g_rssfile) but they weren't variables in the traditional sense, they were more or less “run-time settable constants” (to the rest of the codebase, the declaration for g_rssfile was extern const char *const g_rssfile). I decided that they needed a renaming to better reflect how I actually use them, and changed the majority of global variables to start with “c_”.

Talk about pain.

Each one required at minimum three edits—the declaration in a header file, the actual declaration, and the setting of said variable when the program starts up. If I had this feature, something that took maybe an hour could have been finished in a few minutes.

But mod_blog is a very small codebase—some 14,000 lines of code. Could such a feature scale to something like the Linux kernel? Or Firefox? Or even Windows Vista? I don't know. And how would you even implement something like that?

My guess—if you even hope to do something like this on multimillion line codebases, you may have to give up on storing the code as text and move on to some other internal format.

It's not like it's a new idea. Most forms of BASIC (you know, that horrible langauge made popular on 8-bit microprocessors of the 70s and 80s) were not stored as text but in a mixture of binary and text form (although you could get a pure text version of the code if you wanted it).

So, what happens if we get away from distinct text files? And hey, why not design (or redesign) a language while we're at it?

A common complaint about static typechecking languages is the requirement to declare all your variables. But if we're using an ideal IDE, one that understands the langauge we're programming in, why not take the work on type inference and use it during the editing phase?

Something like:

[Example 1]

The editing takes place on the right-hand side, whereas the IDE will track your variables and types on the left-hand side. In this simple example, we see that the IDE has determined that the function nth() takes an integer, and returns a constant string.

In this example:

[Example 2]

The IDE inferred that the function foo() will return either a constant string or a number, which is highlighted in red to indicate the conflict (not that it won't run depending upon the language—it's just highlighting the fact that this function will return one of two types). It also inferred that the parameters are of type “number” (doubles, floats, integers, what have you).

So, the IDE could be doing these types annotations for you, but why not the ability to further annotate the annotations? I don't see why you couldn't edit the left-hand side to, say, change the type the IDE detected, or even annotate further conditions:

[Example 3]

Here, we annotated that b is not to be 0, and the IDE then highlighted the code to say “hey, this can't happen.” The assumption here is, the compiler can then use the annotations to statically check the code, and if it can determine at compile time that b is 0, then flag a compilation error—otherwise it can insert the runtime code for us to check and raise an exception (or do the equivilent of assert()) at runtime.

(And if we have all this syntax and typechecking stuff going on, along with the ability to change variable and function names at will without having to re-edit a bunch of code, we might as well have the IDE compile the code as we write it—although on a huge codebase this may be impractical—just a thought)

I'm still not entirely sure how to present the source code though. Since this “pie-in-the-sky” IDE stores the source code in some internal format, the minimum “working unit” isn't a file. I want to say that the minimum “working unit” is a function (that's how the examples are presented), or maybe a group of related functions. Heck, at this stage, we could probably incorporate Literate Programming principles.

Another feature that I don't think any existing IDE has is revision control as part of the system. And like the editing portion (“I want my editor, not the crap one the IDE provides”), revision control is another area of contention (not only over say, CVS vs. SVN, but centralized vs. decentralized, file-based vs. content-based, commenting every change vs. commenting over a series of changes, etc.). But since I'm taking a “pie-in-the-sky” approach to IDEs, I'll include revision control from within it as well.

It would probably also help with managing slightly different versions of the code base. For instance, the original version of the graylist daemon had the following bit of code to generate a report (more or less pulled from another daemon I had written):

static void handle_sigusr1(void)
{
  Stream out;
  pid_t  child;
  size_t i;

  (*cv_report)(LOG_DEBUG,"","User 1 Signal");
  mf_sigusr1 = 0;

  child = fork();
  if (child == (pid_t)-1)
  {
    (*cv_report)(LOG_CRIT,"$","fork() = %a",strerror(errno));
    return;
  }

  out = FileStreamWrite(c_dumpfile,FILE_CREATE | FILE_TRUNCATE);
  if (out == NULL)
  {
    (*cv_report)(LOG_ERR,"$","could not open %a",c_dumpfile);
    _exit(0);
  }

  for (i = 0 ; i < g_poolnum ; i++)
  {
    LineSFormat(
        out,
        "$ $ $ $ $ $ $ $ L L",
        "%a %b %c %d%e%f%g%h %i %j\n",
        ipv4(g_tuplespace[i]->ip),
        g_tuplespace[i]->from,
        g_tuplespace[i]->to,
        (g_tuplespace[i]->f & F_WHITELIST) ? "W" : "-",
        (g_tuplespace[i]->f & F_GRAYLIST)  ? "G" : "-",
        (g_tuplespace[i]->f & F_TRUNCFROM) ? "F" : "-",
        (g_tuplespace[i]->f & F_TRUNCTO)   ? "T" : "-",
        (g_tuplespace[i]->f & F_IPv6)      ? "6" : "-",
        (unsigned long)g_tuplespace[i]->ctime,
        (unsigned long)g_tuplespace[i]->atime
    );
  }

  StreamFree(out);
  _exit(0);
}

It works on all the development servers, but not the actual server.

Sigh.

Next version:

static void handle_sigusr1(void)
{
  Stream out;
#ifdef CAN_DO_FORK
  pid_t  child;
#endif
  size_t i;

  (*cv_report)(LOG_DEBUG,"","User 1 Signal");
  mf_sigusr1 = 0;

#ifdef CAN_DO_FORK
  child = fork();
  if (child == (pid_t)-1)
  {
    (*cv_report)(LOG_CRIT,"$","fork() = %a",strerror(errno));
    return;
  }
#endif

  out = FileStreamWrite(c_dumpfile,FILE_CREATE | FILE_TRUNCATE);
  if (out == NULL)
  {
    (*cv_report)(LOG_ERR,"$","could not open %a",c_dumpfile);
#ifdef CAN_DO_FORK
    _exit(0);
#else
    return;
#endif
  }

  for (i = 0 ; i < g_poolnum ; i++)
  {
    LineSFormat(
        out,
        "$ $ $ $ $ $ $ $ L L",
        "%a %b %c %d%e%f%g%h %i %j\n",
        ipv4(g_tuplespace[i]->ip),
        g_tuplespace[i]->from,
        g_tuplespace[i]->to,
        (g_tuplespace[i]->f & F_WHITELIST) ? "W" : "-",
        (g_tuplespace[i]->f & F_GRAYLIST)  ? "G" : "-",
        (g_tuplespace[i]->f & F_TRUNCFROM) ? "F" : "-",
        (g_tuplespace[i]->f & F_TRUNCTO)   ? "T" : "-",
        (g_tuplespace[i]->f & F_IPv6)      ? "6" : "-",
        (unsigned long)g_tuplespace[i]->ctime,
        (unsigned long)g_tuplespace[i]->atime
    );
  }

  StreamFree(out);
#ifdef CAN_DO_FORK
  _exit(0);
#endif
}

Ugly as hell. But typical of “portable” C code. If, however, one could easily make alternative versions (or branches) to the code, then I could, say, branch the previous version into the “Can do fork” and the “Not a forking chance” versions, then all this #ifdef crap. And by removing all that #ifdef crap, it makes it easier to follow the code.

And if you need to see all the current versions?

I guess something like FileMerge could be used to view the different revisions (and if the minimum “working unit” is the function, we get very fine-grained revision control).

And I suppose, while I'm at it, the ability to not only debug from the IDE, but edit a running instance of the program wouldn't be asking too much, although doing so for any arbitrary language may be difficult to darn near impossible.

Obligatory Picture

[It's the most wonderful time of the year!]

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: http://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

http://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2019 by Sean Conner. All Rights Reserved.