The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Thursday, October 11, 2007

A little lesson on i18n

Then your Russian translator calls on the phone, to personally tell you the bad news about how really unpleasant your life is about to become:

Russian, like German or Latin, is an inflectional language; that is, nouns and adjectives have to take endings that depend on their case (i.e., nominative, accusative, genitive, etc …)—which is roughly a matter of what role they have in syntax of the sentence—as well as on the grammatical gender (i.e., masculine, feminine, neuter) and number (i.e., singular or plural) of the noun, as well as on the declension class of the noun. But unlike with most other inflected languages, putting a number-phrase (like “ten” or “forty-three”, or their Arabic numeral equivalents) in front of noun in Russian can change the case and number that noun is, and therefore the endings you have to put on it.

He elaborates: In “I scanned %g directories”, you'd expect “directories” to be in the accusative case (since it is the direct object in the sentence) and the plural number, except where $directory_count is 1, then you'd expect the singular, of course. Just like Latin or German. But! Where $directory_count % 10 is 1 (“%” for modulo, remember), assuming $directory count is an integer, and except where $directory_count % 100 is 11, “directories” is forced to become grammatically singular, which means it gets the ending for the accusative singular … You begin to visualize the code it'd take to test for the problem so far, and still work for Chinese and Arabic and Italian, and how many gettext items that'd take, but he keeps going … But where $directory_count % 10 is 2, 3, or 4 (except where $directory_count % 100 is 12, 13, or 14), the word for “directories” is forced to be genitive singular—which means another ending … The room begins to spin around you, slowly at first … But with all other integer values, since “directory” is an inanimate noun, when preceded by a number and in the nominative or accusative cases (as it is here, just your luck!), it does stay plural, but it is forced into the genitive case—yet another ending … And you never hear him get to the part about how you're going to run into similar (but maybe subtly different) problems with other Slavic languages like Polish, because the floor comes up to meet you, and you fade into unconsciousness.

The above cautionary tale relates how an attempt at localization can lead from programmer consternation, to program obfuscation, to a need for sedation. But careful evaluation shows that your choice of tools merely needed further consideration.

Via Flutterby, A Localization Horror Story: It Could Happen To You

Yikes!

And I thought I was being clever when writing my own replacement for printf() that allowed you to refer to parameters by placement.

Okay, I'll have to explain that.

In printf(), you specify variables with a special code, such as “%s” for a string variable, “%d” for an integer:

printf("Hey!  I saw %d %s\n",count,object);

But the variable specifiers and the variables themselves have to match up. So, if I want to change the output from “I saw X blah” to “blah: X” I not only have to change the text, but the order of the parameters as well:

printf("%s: %d\n",object,count);

Also, if you wanted to print a value multiple times (and it sometimes comes up) you have to repeat that value:

/* helps to say this as Foghorn Leghorn */
printf("That boy is %s!  %s, I say!\n",adjective,adjective);

I got around that by separating the variable types and parameter positions from the format string:

my_output("i $","Hey!  I saw %a %b\n",count,object);
my_output("i $","%b: %a\n",count,object);
my_output("$","That boy is %a!  %a, I say!\n",adjective);

(okay, here, “i” denotes an integer parameter and “$” a string parameter. In the format string, “%a” refers to the first parameter, “%b” the second, and so on)

The idea was the ability to change the string without having to change any code, and to that end, I did quite well. What I didn't realize is that full i18n is hard. I mean, just beyond character set issues.

Man, I had no idea languages could be so crazy.

Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.