The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Wednesday, August 09, 2006

Musings on typechecking

I was thinking a bit more about yesterdays post on variable types and character sets. I mean, yes, one could conceivably make subtypes of strings with specific character sets:

class String { /* ... */ } ;
class StringISO8859d1 : public String { /* ... */ } ;
class StringISO8859d5 : public String { /* ... */ } ;
class StringUTFd8     : public String { /* ... */ } ;

String          foo = "Hello";	// perhaps a warning here
StringISO8859d1 bar = "Hello";
StringISOUTFd8  baz = "Wal★Mart";

bar = baz;	// two things could happen here
		// 1. compiler spits out a warning
		// or error because you are trying
		// to store a variable of one type
		// into another without an explicit
		// conversion step, or
		// 2. The compiler does the conversion
		// for you, much like an int → double
		// conversion.  Problem here is what if the
		// source string has characters not
		// representable in the destination string.

But that doesn't work that well when you want to determine if the string has been converted to HTML entities or not, as it gets unwieldy fast:

StringISO8859d1NoEntities foo;
StringISO8859d1Entities   bar;
StringISO8859d5NoEntities baz;
StringUTFd8Entities       fubar; // Silliness!  Silliness I say!

And even more importantly, what if you don't know the character set of the data until runtime? In that sense, the character set and the encoding is a type of attribute of the string, or a sub-type of the string.

Yes, such information could be added to the base String class with appropriate methods to check and set this sub-type (or attribute-like) information, but I'd still like to get compile time checks whenever and where ever I can. For instance:

StringEntity   foo = "Johnson & Co.";
StringNoEntity bar = "American Telegraph & Telephone";
String         baz;

baz = foo + bar;	// now what?  We're adding a string where
			// "&" is encoded as "&" to a string
			// where the "&" appears as is.  Do we
			// 'de-entify' foo or 'entify' bar?  And
			// what is the programer expectation?
			// Probably not much, given this piece
			// of code.

I hope you can see where I'm trying to go with this, and track a form of intent throughout the code. Variable types are more than just annoying muddled headed programmers and producing fast code—it's also a statement of what we (or at least I, as a programmer) can do to the variable—what methods of manipulation we want done and having the computer find that at compile time where it's certainly cheaper and easier to fix than at the customer site.

I think what I'm aiming for is a way of annotating a variable with more than just type information and having those annotations checked and enforced by the compiler. Another example might be unit tracking. Say, making sure that you add a LENGTH to a LENGTH but that a LENGTH times a LENGTH is an AREA, and you can't add LENGTH to an AREA. Or that this variable is in inches, that in millimeters, and have the computer keep track of multiplying or dividing by 25.4 as the value moves from one variable to another (I think there are some specialized computer languages that can do this, but I don't know of any general computer language supporting this).

Obligatory Picture

[“I am NOT a number, I am … a Q-CODE!”]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site:, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.