Thursday, October 11, 2007
A little lesson on i18n
Then your Russian translator calls on the phone, to personally tell you the bad news about how really unpleasant your life is about to become:
Russian, like German or Latin, is an inflectional language; that is, nouns and adjectives have to take endings that depend on their case (i.e., nominative, accusative, genitive, etc …)—which is roughly a matter of what role they have in syntax of the sentence—as well as on the grammatical gender (i.e., masculine, feminine, neuter) and number (i.e., singular or plural) of the noun, as well as on the declension class of the noun. But unlike with most other inflected languages, putting a number-phrase (like “ten” or “forty-three”, or their Arabic numeral equivalents) in front of noun in Russian can change the case and number that noun is, and therefore the endings you have to put on it.
He elaborates: In “I scanned %g directories”, you'd expect “directories” to be in the accusative case (since it is the direct object in the sentence) and the plural number, except where $directory_count is 1, then you'd expect the singular, of course. Just like Latin or German. But! Where $directory_count % 10 is 1 (“%” for modulo, remember), assuming $directory count is an integer, and except where $directory_count % 100 is 11, “directories” is forced to become grammatically singular, which means it gets the ending for the accusative singular … You begin to visualize the code it'd take to test for the problem so far, and still work for Chinese and Arabic and Italian, and how many gettext items that'd take, but he keeps going … But where $directory_count % 10 is 2, 3, or 4 (except where $directory_count % 100 is 12, 13, or 14), the word for “directories” is forced to be genitive singular—which means another ending … The room begins to spin around you, slowly at first … But with all other integer values, since “directory” is an inanimate noun, when preceded by a number and in the nominative or accusative cases (as it is here, just your luck!), it does stay plural, but it is forced into the genitive case—yet another ending … And you never hear him get to the part about how you're going to run into similar (but maybe subtly different) problems with other Slavic languages like Polish, because the floor comes up to meet you, and you fade into unconsciousness.
The above cautionary tale relates how an attempt at localization can lead from programmer consternation, to program obfuscation, to a need for sedation. But careful evaluation shows that your choice of tools merely needed further consideration.
Via Flutterby, A Localization Horror Story: It Could Happen To You
Yikes!
And I thought I was being clever when writing my own replacement for
printf()
that allowed you to refer to parameters by
placement.
Okay, I'll have to explain that.
In printf()
, you specify variables with a special code, such
as “%s” for a string variable, “%d” for an integer:
printf("Hey! I saw %d %s\n",count,object);
But the variable specifiers and the variables themselves have to match up. So, if I want to change the output from “I saw X blah” to “blah: X” I not only have to change the text, but the order of the parameters as well:
printf("%s: %d\n",object,count);
Also, if you wanted to print a value multiple times (and it sometimes comes up) you have to repeat that value:
/* helps to say this as Foghorn Leghorn */ printf("That boy is %s! %s, I say!\n",adjective,adjective);
I got around that by separating the variable types and parameter positions from the format string:
my_output("i $","Hey! I saw %a %b\n",count,object); my_output("i $","%b: %a\n",count,object); my_output("$","That boy is %a! %a, I say!\n",adjective);
(okay, here, “i” denotes an integer parameter and “$” a string parameter. In the format string, “%a” refers to the first parameter, “%b” the second, and so on)
The idea was the ability to change the string without having to change any code, and to that end, I did quite well. What I didn't realize is that full i18n is hard. I mean, just beyond character set issues.
Man, I had no idea languages could be so crazy.