Monday, April 20, 2009

address to latitude-longitude

So my buddy Chris at the church asked me about using google spreadsheets to map some addresses. After discussing some privacy concerns, it seemed OK (people had volunteered this information, and they'd already said we could tell interested people, "Here's a home group open to new members").

For some reason the google spreadsheets are like molasses in February on my box -- it's like a 2GHz P4 but my Linux distro is old enough that I'm running ffox2.16 -- i think there are other CPU-hogs on the box -- RAM-hogs too.) I told him that it looked pretty slick though....

He said yeah it is, but without latitude/longitude Google won't map locations from the spreadsheet. That seemed odd to me, but it also seemed to me that if you look for an address like "950 santa cruz, menlo park, ca 94025", the lat/lng are returned in the HTML that maps.google.com sends back.

Oooh, I thought, "lynx -dump -source http://maps.google.com/maps?SOMETHING_OR_OTHER" might be my friend here. Then use PERL or Python to extract the numbers... Then I had a better thought: use one language, one program, rather than lynx and shell and sed/perl/python.

So I hacked together a little Python script that looks basically like this:
#!/usr/bin/python -tt
# vim:sw=4:ts=8:et

import httplib
import re
import string
import sys

DEBUG=0
# DEBUG=1

def main(argv):
"given an address in argv, give human-readable lat/long"

the_addr = re.sub('\s+', '+', string.join(argv))
map_html = ' ' + addr2page(the_addr)
print "latitude:", interp_coords(map_html, 'lat', 'south', 'north')
print "longitude:", interp_coords(map_html, 'lng', 'west', 'east')
sys.exit(0)

def addr2page(the_addr):
"""given an address string, return a long html string from google maps.
address string should contain no whitespace."""

map_site = 'maps.google.com'
map_query = '/maps?q=' + the_addr

if DEBUG:
print "DEBUG: if this were for real, we'd go to"
print "DEBUG: http://" + map_site + map_query
print "DEBUG: but it's not, so let's not and say we did"
return '<html> lng:-122.3456 lat:33.4567'
conn = httplib.HTTPConnection(map_site)
conn.request('GET', map_query)
r1 = conn.getresponse()
if r1.status != 200:
# Trouble in paradise
print >> sys.stderr, r1.status, r1.reason
the_page = r1.read() # uselesss
conn.close()
sys.exit(1)
# Got 'OK' so continue
the_page = r1.read()
conn.close()
return the_page

def interp_coords(html_string, LL, if_neg, if_pos):
"""return a substring of the form '(west) -123.456' from html_string
given a prefix 'lat:' or 'lng:' (supplied in 'LL')
if_neg => what to put in parens if the string is (duh) negative
if_pos => what you would think"""

coord = re.search('\W' + LL + ':([-+]?[.0-9]+)', html_string)
if coord is not None:
coord = coord.group(1)
if coord.startswith('-'):
suf = if_neg # i.e., it was negative
else:
suf = if_pos
return '(' + suf + ') ' + coord
return '??'

if __name__ == '__main__':
main(sys.argv[1:])
Python made it easy to throw that together. so this works great from the command line:
$ ./a2l.py 950 santa cruz, menlo park, ca 94025
latitude: (north) 37.449289999999998
longitude: (west) -122.187619
$
Of course, Chris isn't a command-line kind of guy. So I ended up making this into a CGI script. For doing this, I often turn to a site that explains the basics -- I just googled on "how CGI works" (no quotes) and found this site: http://www.howstuffworks.com/cgi.htm, which was very helpful. I used the Python library "cgi" to handle parameters. Worked like a champ.

Then Chris told me about a list of addresses separated by tabs. Python made this a piece of cake. First, I took "tab-separated list of addresses" literally, and did this:
        addrs = form.getvalue(form_q)
if isinstance(addrs, list):
addr_list = addrs
else:
# maybe a TAB-separated string
addr_list = addrs.split('\t')
for an_addr in addr_list:
an_addr = an_addr.strip()
if len(an_addr) > 0:
do_one_addr(an_addr)
This actually includes the other thing: if you're typing in an HTML form and hit the TAB key, what usually happens? What happens to me is I end up going to the next field in the form. So I decided to just make an alternative page, which had maybe a couple dozen single-line input boxes with the same name (i.e., the very creative "q", for "query"). So i gave Chris a choice of a big text box (as "<textarea name="q" rows=16 cols=255> </textarea>") or a pile of single-line fields, as
  <br/><INPUT TYPE=text NAME=q SIZE=128 MAXLENGTH=255> 
<br/><INPUT TYPE=text NAME=q SIZE=128 MAXLENGTH=255>
<br/><INPUT TYPE=text NAME=q SIZE=128 MAXLENGTH=255>
[[etc]]
So he can use the textarea version in case he has a tab-separated list of addresses in a Mi¢ro$oft Word® document; if he's typing and using the TAB key at the end of addresses, he can use the version that has multiple single-line text inputs.

I don't think I'd ever done that before. It sure was fun! I'm not going to tell you were everything is -- I don't want google maps to get a pile of traffic from my site and then blacklist me.

But Google + Python + that "How CGI works" site all made it easy to learn stuff and be productive in short order, even while on vacation. And hey, even Econo-Lodge has free wi-fi! (If you want it at Motel6, you need to pay $2.99/night, and it might be a little slow.)

No comments: