The Boston Diaries

Or funnier, depending upon your viewpoint.

I'm follwing Sam Ruby's advice on Iñtërnâtiônàlizætiøn and testing what comes through as I copy-n- paste.

I'm using FireFox verion 1.0 under Linux for testing; I've yet to use Microsoft's Internet Explorer (which should prove to be dreadful and amusing at the same time).

So I'm copy-n-pasting the following from my last entry:

Γªρßåγ€

It really bugs me

Building character

And what I get back is:

text is WINDOWS-1252

00000000: 0D 0A 26 23 39 31 35 3B AA 26 23 39 36 31 3B DF ..Γ.ρ.
00000010: E5 26 23 39 34 37 3B 80 0D 0A 0D 0A 49 74 20 72 .γ.....It r
00000020: 65 61 6C 6C 79 20 62 75 67 73 20 6D 65          eally bugs me

Γªρßåγ€

It really bugs me

What the XXXX?

Not only is Firefox, under Linux, sending input to the test form as WINDOWS-1252 but it's also sending me HTML entities in numeric form! They're the correct numeric entities for the characters in question, but … what the XXXX?

To say I wasn't expecting this is a bit of an understatement.

Well, back to hacking.

Yet more character building

Back from hacking.

And it's been an interesting session. Learned quite a bit, and picked up some new tricks as well.

I'm doing some testing when I copy-n-pasted the following:

sensorâ€”and? How did that happen? Let's look at the O'Reilly source from which Cory copy and pasted:

The phone has become a platform, moving beyond mere voice to smart mobile sensor—and back to phone again, by way of voice-over-IP.

Sam Ruby: Copy and Paste

The program I wrote would classify the text as UTF-8, then iconv() would return an error. I rewrote the conversion routine so that when it failed (iconv() would return where it failed doing the conversion) I would re-classify the remaining text and continue.

Doing that, the text fragment above would be first tagged as UTF- 8, then WINDOWS-1252 and displayed it correctly:

sensor—and? How did that happen? Let's look at the O'Reilly source from which Cory copy and pasted:

The phone has become a platform, moving beyond mere voice to smart mobile sensor—and back to phone again, by way of voice-over-IP.

But if I copied the text twice, it would still be tagged as first UTF-8 then WINDOWS-1252, but the second copy would be incorrect:

sensor—and? How did that happen? Let's look at the O'Reilly source from which Cory copy and pasted:

The phone has become a platform, moving beyond mere voice to smart mobile sensor—and back to phone again, by way of voice-over-IP.

sensorâ€”and? How did that happen? Let's look at the O'Reilly source from which Cory copy and pasted:

The phone has become a platform, moving beyond mere voice to smart mobile sensor—and back to phone again, by way of voice-over-IP.

Not really sure how to handle that (“garbage in, garbage out” and all that) but it's a lot better than things were before. All that was left was to add some more code to allow plain text or HTML formatted text and a preview mode~~; I put it online so those of you who are curious can play around with it~~.

The trick I learned (an epiphany if you will): I added the following to the code:

volatile int g_debug = 1;

while(g_debug)
  ;

That will cause the program to just sit there, doing vast amounts of nothing really fast. The reason for such a weird thing is that debugging a CGI program (and yes, this is written in C—don't ask) is not easy (I used to go through quite a bit of rigamarole to simulate the webserver environment so I could use a debugger). This trick allows the webserver to run the program (which will just sit there) and then I can then use gdb to attach to the running process to debug it (once in, I set my breakpoint, then do set g_debug=0 and resume execution of the program—wish I knew about this eight years ago).

Another amusing thing I learned—that the “/” character in Firefox will bring up a search box. It's not a bad thing, until you try typing a “/” in a <TEXTAREA> field. Then it gets right down annoying.

Now to take what I have and integrate it.

Sunday, Debtember 05, 2004

Building more character

Yet more character building

Obligatory Picture

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer