The Boston Diaries

Wednesday, February 21, 2007

One of Microsoft's secrets cracked at last

It's a lead-pipe cinch, I figure. I'm a good detective. I've found opium dens in Vientiane; been granted interviews by cardinals, mafiosi, and sheikhs; discovered the meaning of “half-and-half” in the old song “Drinkin' Wine, Spo-Dee-O-Dee”; conned the Vatican into bestowing a doctorate on me so that I could gain access to hiddenmost archives; deciphered the cryptic message Ezra Pound scrawled in his own copy of the Cantos while in the bughouse; tracked down and interviewed Phil Spector's first wife, long presumed dead; charted my way to the sacred stone of the Great Mother, in Cyprus; gotten Charlotte Rampling's cell-phone number; even come close to understanding the second page of my Con Ed bill. Finding out where a picture was taken—a picture plastered on millions of computer screens—seems a shot away.

Via Jason Kottke, Autumn and the Plot Against Me

For Bunny, who has this picture as her desktop …

Fun with static types

The real issue is one of “type decoration”, i.e., putting in little tokens here and there to give hints to the compiler as to what is going on. No one wants to do this. The dynamic typing camp sacrifices a certain amount of “compile-time” error checking to avoid having to decorate the code. The static typing camp uses smart tools to minimize decoration, and they “grin and bear it” otherwise. To caraciture both camps, the dynamic type camp says “it is so painful to type ‘integer’ that I never ever am willing to do it, but I'll live with runtime exceptions” whereas the static-typing camp says “my type system is so smart I almost never have to add declarations, but the thought of the computer ‘guessing’ my intent and ‘dwimming’ my code is anathema. So I'd rather type the occasional type declaration to avoid it.”

As I mentioned before, I'm very much in the “dynamic” camp. When I have to work with a language which is not only statically typed, but provides no tools for reducing the amount of type decoration, and furthermore still allows one to write meaningless expressions that appear to be meaningful, I end up feeling encumbered. So I don't think of “lightweight” and “type declarations” go together very well. I'm sure some agree and others disagree.

Re: cheerful static typing (was: Any and Every … (was: Eval))

I quote this because this seems to be indicative of the current “anti- static type” camp of programmers that seems to be very popular these days, which seems to come down to, “I hate typing at the keyboard.” That, and “I hate thinking about my program” (and here you clearly see what side of the debate I'm on—I'm a static-type fascist). But over the past few days I came up with an example (and I wouldn't be surprised if someone else hasn't thought of this) where static typing can actually increase the expressiveness (read: less typing required) of a language.

I'll start with an example in C (I would use Ada, but it's been years since I last used it, I don't have my references handy, and it would be about twice as verbose, so there's some merit to the “too many keys” argument of the dynamicists):

{
  FILE *in;
  FILE *out;
  int   c;

  in  = fopen("input","r");
  out = fopen("output","w");

  /*-----------------------------
  ; yes, there's a bug here, but
  ; that's due to a bug in the
  ; design of the C Standard Library
  ; API.  feof() only returns TRUE
  ; *after* we've attempted to 
  ; read past the last character.
  ;-----------------------------*/

  while(!feof(in))
  {
    c = fgetc(in);
    c = toupper(c);
    fputc(c,out);
  }

  fclose(out);
  fclose(in);
}

It's not a terrible amount of code (as say, compared to the equivalent in this age's Cobol—Java) but since we already have the types (that the fascist language C requires) why not do something other than simple type checking? Why not put that knowledge to use and have the compiler write some code for us? We could instruct the compiler (for our now hypothetical language) that if it sees an assignment of the form:

character-type “=” file-type

it should read the next byte from the file. Conversely, if it sees

file-type “=” character-type

it should then write the given character to the file.

{
  FILE in;
  FILE out;
  char c;

  in  = fopen("input","r");
  out = fopen("output","w");

  while(!feof(in))
  {
    c   = in;
    out = toupper(c);
  }

  fclose(out);
  fclose(in);
}

It may look unusual, but a similar notion exists in another language right now, one that is (or rather, was) quite popular: Perl.

{
  open(IN,"input");
  open(OUT,">output");
  while($line = <IN>)
  {
    print OUT $line;
  }
  close(OUT);
  close(IN);
}

(Of course, since Perl is dynamic, you have to have the file handle between the angle brackets, which tells Perl you want to do a read from the file. Attempting to do:

#!/usr/bin/perl

while($line = <STDIN>)
{
  <STDOUT> = $line;
}

fails with

Can't modify <HANDLE> in scalar assignment at ./t.pl line 5, near "$line;"
Execution of ./t.pl aborted due to compilation errors.

so it's almost what we're doing here, but just in one special case)

Now, while we're at it, why not add a bit more syntactic sugar and use a common object oriented notation (and at the same time, if a ~~function~~ method takes no formal parameters, just dump the superfluous paranthesis):

{
  File in;
  File out;
  char c;

  in  = File.read("input");
  out = File.write("output");

  while(!in.eof)
  {
    c   = in;
    out = c.toupper;
  }

  out.close;
  in.close;
}

Wait a second … let's go even futher. Why bother with the actual function names read and write? By making the file types more specific, and with a smart enough compiler to call destructors upon leaving the functional scope, we can indeed cut out even more clutter:

{
  FileIn  in  = "/tmp/input";
  FileOut out = "/tmp/output";
  char    c;

  while(in)
  {
    c   = in;
    out = c.toupper;
  }
}

So, we've instructed the compiler to note

file-input-type “=” string

and therefore open the requested file for input; the same can be said for file-output-types as well. In a boolean context, it makes sense to check if the file is at the end.

Now, in all this, I've been transforming the data, but if I want to skip even that, why should I have to write:

{
  FileIn  in  = "/tmp/input";
  FileOut out = "/tmp/output";
  char    c;

  while(in)
  {
    c   = in;
    out = c;
  }
}

When I could just:

{
  FileIn  in  = "/tmp/input";
  FileOut out = "/tmp/output";

  out = in;
}

But is that expecting too much? In the previous example, we get an actual file copy, but what if we don't want that? What if we want to copy not the file, but the “variable” in itself? Well, here, types again come to the rescue because copying the contents of an “input file” to the contents of another “input file” doesn't make semantic sense, so in that case, it's the variables themselves that are copied, not the data in the files they represent:

{
  FileIn in1 = "/tmp/input";
  FileIn in2;

  in2 = in1;	// now in2 and in1 reference
  ...		// the same input file
}

The argument could be made that just like polymorphism and operator overloading, this will lead to inscrutable code, but personally, I'd love a language that would allow me to do such things (and I'm not even sure what to call this … it's not exactly a type conversion). I've also glossed over how to code these conversions but I'm sure that just like operator overloading in C++ a syntax can be evolved.

Wednesday, February 21, 2007

One of Microsoft's secrets cracked at last

Fun with static types

Obligatory Picture

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer