Encoding error while fetching HTML

Question

On Python3.2 I am getting following error when trying to get HTML from remote site, it works well on Python 2.7

enter image description here

Code:

def connectAmazon():
    usleep = lambda x: sleep(x/1000000.0)
    factor = 400
    shouldRetry = True
    retries = 0
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.102 Safari/537.36'}
    attempt = 0
    while shouldRetry == True:
        random = randint(2, 9)
        attempt += 1
        print ("Attempt#", attempt)
        #print (attempt)
        url = "http://www.amazon.com/gp/offer-listing/B009OZUPUC/sr=/qid=/ref=olp_prime_new?ie=UTF8&colid=&coliid=&condition=new&me=&qid=&seller=&shipPromoFilter=1&sort=sip&sr"
        html = requests.get(url)
        status = html.status_code
        if status == 200:
            shouldRetry = False
            print ("Success. Check HTML Below")
            print(html.text) #The Buggy Line
            break
        elif status == 503:
            retries += 1
            delay = random * (pow(retries, 4)*100)
            print ("Delay(ms) = ", delay)
            #print (delay)
            usleep(delay)
            shouldRetry = True


connectAmazon()

What to be done to resolve this on Python 3.2 or Py 3.x?

It would help to some some actual code. Specially the one dealing with the output. — Paulo Bu
– Paulo Bu, Commented Jan 31, 2014 at 20:01
General tip: We HATE screenshots of errors. they're not searchable. It's a windows command prompt. you can mark/copy the text contents of the window by clicking the little 'cmd' icon in the top left and selecting the various options under the "edit" sub-menu. — Marc B
– Marc B, Commented Jan 31, 2014 at 20:01
@MarcB it was sent by one of the devs as I can't produce the result on my machine.. I asked him to send text though. — Volatil3
– Volatil3, Commented Jan 31, 2014 at 20:04
@Volatil3 and you say that works on Python2? Can you tell me html.text what type it is both in Python2 and Python3? They are probably unicode and str respectively but just to make sure. — Paulo Bu
– Paulo Bu, Commented Jan 31, 2014 at 20:12

Paulo Bu · Accepted Answer · 2014-01-31 20:47:56Z

2

Ok, Windows Command Line is very problematic with encodings^*. The encoding error is because when outputting, print is encoding html.text into the cmd encoding (you can know which one it is by issuing command chcp). There is probably one char in html.text than can't be encoded in cmd's encoding.

My solution for Python3 would be forcing an output encoding. Sadly, in Python3 this is a little more problematic than I would like. You'll need to replace the line print(html.text) for:

import sys
sys.stdout.buffer.write(html.text.encode('utf8'))

Of course, that line won't work in Python2. In Python2 you can just encode your output before printing it so print(html.text) can be replaced with:

print html.text.encode('utf8')

Important note: In Python2 print is a keyword, not a function. So calling print('hi') works because print is printing the expression inside the parenthesis. When you do print('hi',2) you'll get the tuple ('hi',2) outputted. That's not exactly what you want. It works by miracle :D

Hope this helps!

_{* This is due to its lack of support to utf8. They have a weird 650001 code page which is not entirely the same as utf-8 and Python does not work with it.}

edited Jan 31, 2014 at 20:47

answered Jan 31, 2014 at 20:35

Paulo Bu

29.9k6 gold badges77 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Volatil3 Over a year ago

I believe your suggested code is for 3.x only because on 2.7 I get 'AttributeError: 'file' object has no attribute 'buffer'

Paulo Bu Over a year ago

I explained in the answer: "Of course, that line won't work in Python2". I gave example for both versions. For Python2 you do the second example: print html.text.encode('utf8'). The two version handle strings differently so having just one method for both would be complicated. You can test for a Python version with sys.version.

Paulo Bu Over a year ago

sys.version_info also gives information about the version with a better format (tuple). You can perform version comparisons with if sys.version_info < (3,0) #python2

Collectives™ on Stack Overflow

Encoding error while fetching HTML

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related