3

I would like to collect information from the results given by a search engine. But I can only write text instead of unicode in the query part.

import urllib2
a = "바둑"
a = a.decode("utf-8")
type(a)
#Out[35]: unicode

url = "http://search.naver.com/search.naver?where=nexearch&query=%s" %(a)
url2 = urllib2.urlopen(url)

give this error

#UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-40: ordinal not in range(128)

1 Answer 1

4

Encode the Unicode data to UTF-8, then URL-encode:

from urllib import urlencode
import urllib2

params = {'where': 'nexearch', 'query': a.encode('utf8')}
params = urlencode(params)

url = "http://search.naver.com/search.naver?" + params
response = urllib2.urlopen(url)

Demo:

>>> from urllib import urlencode
>>> a = u"바둑"
>>> params = {'where': 'nexearch', 'query': a.encode('utf8')}
>>> params = urlencode(params)
>>> params
'query=%EB%B0%94%EB%91%91&where=nexearch'
>>> url = "http://search.naver.com/search.naver?" + params
>>> url
'http://search.naver.com/search.naver?query=%EB%B0%94%EB%91%91&where=nexearch'

Using urllib.urlencode() to build the parameters is easier, but you can also just escape the query value with urllib.quote_plus():

from urllib import quote_plus
encoded_a = quote_plus(a.encode('utf8'))
url = "http://search.naver.com/search.naver?where=nexearch&query=%s" % encoded_a
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.