Tuesday, October 12, 2004
“Oh! You meant DUTTON, not DAYTON!”
I'm currently working on a website where one of the requirements is to obtain the latitude and longitude of the user. This was something I was dreading not from a programming perspective (since just asking for the latitude and longitude is dead simple) but from a user interface perspective (since from the user end, it's not quite so dead simple). It'd be nice if I could just ask the user for the city they're in, and get the latitude and longitude from that.
But where could I get that type of information?
From the US Census Bureau.
D'oh!
Granted, that still leaves me asking the rest of the world to locate their own latitude and longitude, but since this site is initially geared towards us Murkins I'm not that concerned about it yet.
Now it's a simple matter of getting the city and state from the user, then looking up the latitude and longitude of the city. Easy.
Until a user misspells a city. The easy thing (for me) would be to print
an error like, “City Cininatee, OH
not found—try again” and
have the user try spelling Sinsinati Cininatee
Cinsinati Cincinnati (there we go!) correctly
(or give up and say they're in Bratenahl as
it's easier to spell). The harder thing to do is figure out what they're
trying to spell and use that.
Only it's not that much harder. I've used both Soundex and Metaphone in another project to correct misspellings and it seems easy enough to apply that here. Lookup the latitude and longitude with the city and state supplied. If not found, then filter the city through Soundex, and look up the correct spelling based on that. If that doesn't return a result (or too many results) then fall back to Metaphone.
Sounds good in theory.
Not so great in practice.
In setting up the appropriate datafiles, I went through the list of city latitude/longitude I picked up from the US Census Bureau and marked where Soundex and Metaphone clashed on multiple city names (each state is treated seperately, so I'm only concerned with clashes within a given state). There, I hit a problem:
conflict(soundex/AL): D500 = [DOTHAN] [DAYTON] conflict(soundex/AL): D500 = [DUTTON] [] conflict(metaphone/AL): TTN = [DUTTON] [DAYTON]
Dothan, Dayton and Dutton (all in Alabama) have a Soundex
code of D500
. Falling back to Metaphone, Dutton and Dayton have
a Metaphone code of TTN
. So what to do here if a user types in
“Daytun”?
I think the correct thing to do at this point would be to list the posibilities and have the user select the proper one. But this will necessitate a change in how I store the data.
It's not easy to create an easy to use interface. In fact, it's downright hard.