Sunday, Debtember 05, 2004
Yet more character building
And it's been an interesting session. Learned quite a bit, and picked up some new tricks as well.
I'm doing some testing when I copy-n-pasted the following:
sensor—and
? How did that happen? Let's look at the O'Reilly source from which Cory copy and pasted:The phone has become a platform, moving beyond mere voice to smart mobile sensor—and back to phone again, by way of voice-over-IP.
The program I wrote would classify the text as UTF-8
, then
iconv()
would return an error. I rewrote the conversion
routine so that when it failed (iconv()
would return where it
failed doing the conversion) I would re-classify the remaining text and
continue.
Doing that, the text fragment above would be first tagged as
UTF-8
, then WINDOWS-1252
and displayed it
correctly:
sensor—and? How did that happen? Let's look at the O'Reilly source from which Cory copy and pasted:
The phone has become a platform, moving beyond mere voice to smart mobile sensor—and back to phone again, by way of voice-over-IP.
But if I copied the text twice, it would still be tagged as first
UTF-8
then WINDOWS-1252
, but the second copy would
be incorrect:
sensor—and? How did that happen? Let's look at the O'Reilly source from which Cory copy and pasted:
The phone has become a platform, moving beyond mere voice to smart mobile sensor—and back to phone again, by way of voice-over-IP.
sensor—and? How did that happen? Let's look at the O'Reilly source from which Cory copy and pasted:
The phone has become a platform, moving beyond mere voice to smart mobile sensor—and back to phone again, by way of voice-over-IP.
Not really sure how to handle that (“garbage in, garbage out” and all
that) but it's a lot better than things were before. All that was left was
to add some more code to allow plain text or HTML formatted text and a
preview mode; I put it online
so those of you who are curious can play around with it.
The trick I learned (an epiphany if you will): I added the following to the code:
volatile int g_debug = 1; while(g_debug) ;
That will cause the program to just sit there, doing vast amounts of
nothing really fast. The reason for such a weird thing is that debugging a
CGI program (and yes,
this is written in C—don't ask) is not easy (I used to go through quite a
bit of rigamarole to simulate the webserver environment so I could use a
debugger). This trick allows the webserver to run the program (which will
just sit there) and then I can then use gdb
to attach to the
running process to debug it (once in, I set my breakpoint, then do set
g_debug=0
and resume execution of the program—wish I knew about
this eight years ago).
Another amusing thing I learned—that the “/” character in Firefox
will bring up a search box. It's not a bad thing, until you try typing a
“/” in a <TEXTAREA>
field. Then it gets right down
annoying.
Now to take what I have and integrate it.