Saturday, September 08, 2007
Notes on an ideal integrated development system
I'm not a fan of IDEs. I grew up with the “edit-compile-run” cyle of development, and while I didn't always have a choice in the “compile” portion of things, I did in the “edit” portion, and over time became very picky about which editor I use. Because of that, whenever I did try an IDE, I invariably found the “edit” portion to be very painful, stuck in an editor that I wasn't used to; being forced to use an unfamiliar editor resulted in a vast loss of productivity and thus, I've never liked IDEs. So I stuck with the “edit-compile-run” cycle.
But the recent bout of programming I've done has made me wish for something better than the “edit-compile-run” cycle. And while IDEs have probably evolved since I last tried them in the late 80s, I don't think they've evolved enough to suit me.
What I'm about to describe is defintely “pie-in-the-sky” stuff. I'm not saying that IDEs must be this way—I'm just saying that this is what I would like in an IDE. Who knows? Maybe this won't work. Maybe it's unworkable. But I wouldn't mind seeing these features (at long as the editing could be configured to my liking).
A database I used in the early 80s that ran on a twin floppy PC. Written by Brian Berkowitz and Richard Ilson
Wonderful features were:
- It stored user-defined names separately from internal IDs, so you could change the names of tables and fields without worrying.
- Fantastic date handling—you could enter “Next Wednesday” and it would work out the date.
- No Table/View separation, you could define a field on a table as being calculated on fields from a related table and that definition became part of the original table.
The one feature of Corner stone that still strikes me as innovative is the separation of variables (or in this case, fields and tables) from their name. One could change the name of a variable without having to edit every other occurrence of that name. That's a very powerful feature, but to implement it in an IDE, that IDE would have to have intimate knowledge of the computer language being used.
A few years ago, I cleaned up the code in mod_blog. I had a bunch of
global variables used throughout the codebase, all starting with “g_” (such
as g_rssfile
) but they weren't variables in the traditional
sense, they were more or less “run-time settable constants” (to the rest of
the codebase, the declaration for g_rssfile
was extern
const char *const g_rssfile
). I decided that they needed a renaming to
better reflect how I actually use them, and changed the majority of global
variables to start with “c_”.
Talk about pain.
Each one required at minimum three edits—the declaration in a header file, the actual declaration, and the setting of said variable when the program starts up. If I had this feature, something that took maybe an hour could have been finished in a few minutes.
But mod_blog is a very small codebase—some 14,000 lines of code. Could such a feature scale to something like the Linux kernel? Or Firefox? Or even Windows Vista? I don't know. And how would you even implement something like that?
My guess—if you even hope to do something like this on multimillion line codebases, you may have to give up on storing the code as text and move on to some other internal format.
It's not like it's a new idea. Most forms of BASIC (you know, that horrible langauge made popular on 8-bit microprocessors of the 70s and 80s) were not stored as text but in a mixture of binary and text form (although you could get a pure text version of the code if you wanted it).
So, what happens if we get away from distinct text files? And hey, why not design (or redesign) a language while we're at it?
A common complaint about static typechecking languages is the requirement to declare all your variables. But if we're using an ideal IDE, one that understands the langauge we're programming in, why not take the work on type inference and use it during the editing phase?
Something like:
The editing takes place on the right-hand side, whereas the IDE will track your
variables and types on the left-hand side. In this simple example, we see
that the IDE has
determined that the function nth()
takes an integer, and returns
a constant string.
In this example:
The IDE inferred
that the function foo()
will return either a constant string or
a number, which is highlighted in red to indicate the conflict (not that it
won't run depending upon the language—it's just highlighting the fact that
this function will return one of two types). It also inferred that the
parameters are of type “number” (doubles, floats, integers, what have
you).
So, the IDE could be doing these types annotations for you, but why not the ability to further annotate the annotations? I don't see why you couldn't edit the left-hand side to, say, change the type the IDE detected, or even annotate further conditions:
Here, we annotated that b
is not to be 0, and the IDE then highlighted the
code to say “hey, this can't happen.” The assumption here is, the compiler
can then use the annotations to statically check the code, and if it can
determine at compile time that b
is 0, then flag a compilation
error—otherwise it can insert the runtime code for us to check and raise an
exception (or do the equivilent of assert()
) at runtime.
(And if we have all this syntax and typechecking stuff going on, along with the ability to change variable and function names at will without having to re-edit a bunch of code, we might as well have the IDE compile the code as we write it—although on a huge codebase this may be impractical—just a thought)
I'm still not entirely sure how to present the source code though. Since this “pie-in-the-sky” IDE stores the source code in some internal format, the minimum “working unit” isn't a file. I want to say that the minimum “working unit” is a function (that's how the examples are presented), or maybe a group of related functions. Heck, at this stage, we could probably incorporate Literate Programming principles.
Another feature that I don't think any existing IDE has is revision control as part of the system. And like the editing portion (“I want my editor, not the crap one the IDE provides”), revision control is another area of contention (not only over say, CVS vs. SVN, but centralized vs. decentralized, file-based vs. content-based, commenting every change vs. commenting over a series of changes, etc.). But since I'm taking a “pie-in-the-sky” approach to IDEs, I'll include revision control from within it as well.
It would probably also help with managing slightly different versions of the code base. For instance, the original version of the graylist daemon had the following bit of code to generate a report (more or less pulled from another daemon I had written):
static void handle_sigusr1(void) { Stream out; pid_t child; size_t i; (*cv_report)(LOG_DEBUG,"","User 1 Signal"); mf_sigusr1 = 0; child = fork(); if (child == (pid_t)-1) { (*cv_report)(LOG_CRIT,"$","fork() = %a",strerror(errno)); return; } out = FileStreamWrite(c_dumpfile,FILE_CREATE | FILE_TRUNCATE); if (out == NULL) { (*cv_report)(LOG_ERR,"$","could not open %a",c_dumpfile); _exit(0); } for (i = 0 ; i < g_poolnum ; i++) { LineSFormat( out, "$ $ $ $ $ $ $ $ L L", "%a %b %c %d%e%f%g%h %i %j\n", ipv4(g_tuplespace[i]->ip), g_tuplespace[i]->from, g_tuplespace[i]->to, (g_tuplespace[i]->f & F_WHITELIST) ? "W" : "-", (g_tuplespace[i]->f & F_GRAYLIST) ? "G" : "-", (g_tuplespace[i]->f & F_TRUNCFROM) ? "F" : "-", (g_tuplespace[i]->f & F_TRUNCTO) ? "T" : "-", (g_tuplespace[i]->f & F_IPv6) ? "6" : "-", (unsigned long)g_tuplespace[i]->ctime, (unsigned long)g_tuplespace[i]->atime ); } StreamFree(out); _exit(0); }
It works on all the development servers, but not the actual server.
Sigh.
Next version:
static void handle_sigusr1(void) { Stream out; #ifdef CAN_DO_FORK pid_t child; #endif size_t i; (*cv_report)(LOG_DEBUG,"","User 1 Signal"); mf_sigusr1 = 0; #ifdef CAN_DO_FORK child = fork(); if (child == (pid_t)-1) { (*cv_report)(LOG_CRIT,"$","fork() = %a",strerror(errno)); return; } #endif out = FileStreamWrite(c_dumpfile,FILE_CREATE | FILE_TRUNCATE); if (out == NULL) { (*cv_report)(LOG_ERR,"$","could not open %a",c_dumpfile); #ifdef CAN_DO_FORK _exit(0); #else return; #endif } for (i = 0 ; i < g_poolnum ; i++) { LineSFormat( out, "$ $ $ $ $ $ $ $ L L", "%a %b %c %d%e%f%g%h %i %j\n", ipv4(g_tuplespace[i]->ip), g_tuplespace[i]->from, g_tuplespace[i]->to, (g_tuplespace[i]->f & F_WHITELIST) ? "W" : "-", (g_tuplespace[i]->f & F_GRAYLIST) ? "G" : "-", (g_tuplespace[i]->f & F_TRUNCFROM) ? "F" : "-", (g_tuplespace[i]->f & F_TRUNCTO) ? "T" : "-", (g_tuplespace[i]->f & F_IPv6) ? "6" : "-", (unsigned long)g_tuplespace[i]->ctime, (unsigned long)g_tuplespace[i]->atime ); } StreamFree(out); #ifdef CAN_DO_FORK _exit(0); #endif }
Ugly as hell. But typical of “portable” C code. If, however, one could
easily make alternative versions (or branches) to the code, then I could,
say, branch the previous version into the “Can do fork” and the “Not a
forking chance” versions, then all this #ifdef
crap. And by
removing all that #ifdef
crap, it makes it easier to follow the
code.
And if you need to see all the current versions?
I guess something like FileMerge could be used to view the different revisions (and if the minimum “working unit” is the function, we get very fine-grained revision control).
And I suppose, while I'm at it, the ability to not only debug from the IDE, but edit a running instance of the program wouldn't be asking too much, although doing so for any arbitrary language may be difficult to darn near impossible.