Wednesday, August 09, 2006
Musings on typechecking
I was thinking a bit more about yesterdays post on variable types and character sets. I mean, yes, one could conceivably make subtypes of strings with specific character sets:
class String { /* ... */ } ; class StringISO8859d1 : public String { /* ... */ } ; class StringISO8859d5 : public String { /* ... */ } ; class StringUTFd8 : public String { /* ... */ } ; String foo = "Hello"; // perhaps a warning here StringISO8859d1 bar = "Hello"; StringISOUTFd8 baz = "Wal★Mart"; bar = baz; // two things could happen here // 1. compiler spits out a warning // or error because you are trying // to store a variable of one type // into another without an explicit // conversion step, or // 2. The compiler does the conversion // for you, much like an int → double // conversion. Problem here is what if the // source string has characters not // representable in the destination string.
But that doesn't work that well when you want to determine if the string has been converted to HTML entities or not, as it gets unwieldy fast:
StringISO8859d1NoEntities foo; StringISO8859d1Entities bar; StringISO8859d5NoEntities baz; StringUTFd8Entities fubar; // Silliness! Silliness I say!
And even more importantly, what if you don't know the character set of the data until runtime? In that sense, the character set and the encoding is a type of attribute of the string, or a sub-type of the string.
Yes, such information could be added to the base String
class with appropriate methods to check and set this sub-type (or
attribute-like) information, but I'd still like to get compile time checks
whenever and where ever I can. For instance:
StringEntity foo = "Johnson & Co."; StringNoEntity bar = "American Telegraph & Telephone"; String baz; baz = foo + bar; // now what? We're adding a string where // "&" is encoded as "&" to a string // where the "&" appears as is. Do we // 'de-entify' foo or 'entify' bar? And // what is the programer expectation? // Probably not much, given this piece // of code.
I hope you can see where I'm trying to go with this, and track a form of intent throughout the code. Variable types are more than just annoying muddled headed programmers and producing fast code—it's also a statement of what we (or at least I, as a programmer) can do to the variable—what methods of manipulation we want done and having the computer find that at compile time where it's certainly cheaper and easier to fix than at the customer site.
I think what I'm aiming for is a way of annotating a variable with more than just type information and having those annotations checked and enforced by the compiler. Another example might be unit tracking. Say, making sure that you add a LENGTH to a LENGTH but that a LENGTH times a LENGTH is an AREA, and you can't add LENGTH to an AREA. Or that this variable is in inches, that in millimeters, and have the computer keep track of multiplying or dividing by 25.4 as the value moves from one variable to another (I think there are some specialized computer languages that can do this, but I don't know of any general computer language supporting this).