1

I want to use Google Language Detection API in my app to detect language of url parameter. For example user requests url

http://myapp.com/q?Это тест

and gets message "Russian". I do it this way:

def get(self):                                            
        url = "http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q="+self.request.query                        
        try:
            data = json.loads(urllib2.urlopen(url).read())                
            self.response.out.write('<html><body>' + data["responseData"]["language"] +'</body></html>')                                  
        except urllib2.HTTPError, e:
            self.response.out.write( "HTTP error: %d" % e.code )
        except urllib2.URLError, e:
            self.response.out.write( "Network error: %s" % e.reason.args[1])

but always get "English" as result because url is encoded in

http://myapp.com/q?%DD%F2%EE%20%F2%E5%F1%F2

I've tried urllib.quote , urllib.urlencode with no luck.

How I have to decode this url for Google Api?

1 Answer 1

3

Maybe urllib.unquote is what you are looking for:

>>> from urllib import unquote
>>> unquote("%DD%F2%EE%20%F2%E5%F1%F2")

This gives you a string in which the characters are in whatever encoding that you've used in the URL. If you want to recode it to a different encoding (say, UTF-8), you have to create a unicode object first and then use the encode method of the unicode object to recode it:

>>> from urllib import unquote, quote
>>> import json, urllib2, pprint
>>> decoded = unicode(unquote("%DD%F2%EE%20%F2%E5%F1%F2"), "windows-1251")
>>> print decoded
Это тест
>>> recoded = decoded.encode("utf-8")

At this point, we have an UTF-8 encoded string, but this is still not suitable to be passed on to the Google Language Detection API:

>>> recoded
'\xd0\xad\xd1\x82\xd0\xbe \xd1\x82\xd0\xb5\xd1\x81\xd1\x82'

Since you want to include this string in a URL as a query argument, you have to encode it using urllib.quote:

>>> url = "http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q=%s" % quote(recoded)
>>> data = json.loads(urllib2.urlopen(url).read())
>>> pprint.pprint(data)
{u'responseData': {u'confidence': 0.094033934,
                   u'isReliable': False,
                   u'language': u'ru'},
 u'responseDetails': None,
 u'responseStatus': 200}
Sign up to request clarification or add additional context in comments.

2 Comments

Looks good when I try to print it, but when I send it to Google it throws exception: UnicodeEncodeError: 'ascii' codec can't encode characters in position
You have to pass recoded on to urllib.quote to obtain a representation which can safely be appended to the Google Language API URL. I'm modifying my example to show that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.