3

I'm creating a Python (using urllib2) parser of addresses with non-english characters in it. The goal is to find coordinates of every address.

When I open this url in Firefox:

http://maps.google.com/maps/geo?q=Czech%20Republic%2010000%20Male%C5%A1ice&output=csv

it is converted (changes in address box) to

http://maps.google.com/maps/geo?q=Czech Republic 10000 Malešice&output=csv

and returns

200,6,50.0865113,14.4918052

which is a correct result.

However, if I open the same url (encoded, with %20 and such) in urllib2 (or Opera browser), the result is

200,4,49.7715220,13.2955410

which is incorrect. How can I open the first url in urllib2 to get the "200,6,50.0865113,14.4918052" result?

Edit:

Code used

import urllib2

psc = '10000'
name = 'Malešice'
url = 'http://maps.google.com/maps/geo?q=%s&output=csv' % urllib2.quote('Czech Republic %s %s' % (psc, name))

response = urllib2.urlopen(url)
data = response.read()

print 'Parsed url %s, result %s\n' % (url, data)

output

Parsed url http://maps.google.com/maps/geo?q=Czech%20Republic%2010000%20Male%C5%A1ice&output=csv, result 200,4,49.7715220,13.2955410

2
  • 1
    Could you specify the exact line of code with urllib2.open please (as you try)? Commented Sep 27, 2012 at 16:26
  • Possible duplicate: percent encoding URL with python (for use with Google Maps API). Commented Sep 27, 2012 at 16:47

1 Answer 1

1

I can reproduce this behavior, and at first I was dumbfounded as to why it's happening. Closer inspection of the HTTP requests with wireshark showed that the requests sent by Firefox (not surprisingly) contain a couple more HTTP-Headers.

In the end it turned out it's the Accept-Language header that makes the difference. You only get the correct result if

  • an Accept-Language header is set
  • and it has a non-english language listed first (the priorities don't seem to matter)

So, for example this Accept-Language header works:

headers = {'Accept-Language': 'de-ch,en'}

To summarize, modified like this your code works for me:

# -*- coding: utf-8 -*-
import urllib2

psc = '10000'
name = 'Malešice'
url = 'http://maps.google.com/maps/geo?q=%s&output=csv' % urllib2.quote('Czech Republic %s %s' % (psc, name))
headers = {'Accept-Language': 'de-ch,en'}

req = urllib2.Request(url, None, headers)
response = urllib2.urlopen(req)
data = response.read()

print 'Parsed url %s, result %s\n' % (url, data)

Note: In my opinion, this is a bug in Google's geocoding API. The Accept-Language header indicates what languages the user agent prefers the content in, but it shouldn't have any effect on how the request is interpreted.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.