0

On Python3.2 I am getting following error when trying to get HTML from remote site, it works well on Python 2.7

enter image description here

Code:

def connectAmazon():
    usleep = lambda x: sleep(x/1000000.0)
    factor = 400
    shouldRetry = True
    retries = 0
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.102 Safari/537.36'}
    attempt = 0
    while shouldRetry == True:
        random = randint(2, 9)
        attempt += 1
        print ("Attempt#", attempt)
        #print (attempt)
        url = "http://www.amazon.com/gp/offer-listing/B009OZUPUC/sr=/qid=/ref=olp_prime_new?ie=UTF8&colid=&coliid=&condition=new&me=&qid=&seller=&shipPromoFilter=1&sort=sip&sr"
        html = requests.get(url)
        status = html.status_code
        if status == 200:
            shouldRetry = False
            print ("Success. Check HTML Below")
            print(html.text) #The Buggy Line
            break
        elif status == 503:
            retries += 1
            delay = random * (pow(retries, 4)*100)
            print ("Delay(ms) = ", delay)
            #print (delay)
            usleep(delay)
            shouldRetry = True


connectAmazon()

What to be done to resolve this on Python 3.2 or Py 3.x?

13
  • 1
    It would help to some some actual code. Specially the one dealing with the output. Commented Jan 31, 2014 at 20:01
  • General tip: We HATE screenshots of errors. they're not searchable. It's a windows command prompt. you can mark/copy the text contents of the window by clicking the little 'cmd' icon in the top left and selecting the various options under the "edit" sub-menu. Commented Jan 31, 2014 at 20:01
  • @MarcB it was sent by one of the devs as I can't produce the result on my machine.. I asked him to send text though. Commented Jan 31, 2014 at 20:04
  • @PauloBu Question updated with Code Commented Jan 31, 2014 at 20:10
  • @Volatil3 and you say that works on Python2? Can you tell me html.text what type it is both in Python2 and Python3? They are probably unicode and str respectively but just to make sure. Commented Jan 31, 2014 at 20:12

1 Answer 1

2

Ok, Windows Command Line is very problematic with encodings*. The encoding error is because when outputting, print is encoding html.text into the cmd encoding (you can know which one it is by issuing command chcp). There is probably one char in html.text than can't be encoded in cmd's encoding.

My solution for Python3 would be forcing an output encoding. Sadly, in Python3 this is a little more problematic than I would like. You'll need to replace the line print(html.text) for:

import sys
sys.stdout.buffer.write(html.text.encode('utf8'))

Of course, that line won't work in Python2. In Python2 you can just encode your output before printing it so print(html.text) can be replaced with:

print html.text.encode('utf8')

Important note: In Python2 print is a keyword, not a function. So calling print('hi') works because print is printing the expression inside the parenthesis. When you do print('hi',2) you'll get the tuple ('hi',2) outputted. That's not exactly what you want. It works by miracle :D

Hope this helps!

* This is due to its lack of support to utf8. They have a weird 650001 code page which is not entirely the same as utf-8 and Python does not work with it.

Sign up to request clarification or add additional context in comments.

3 Comments

I believe your suggested code is for 3.x only because on 2.7 I get 'AttributeError: 'file' object has no attribute 'buffer'
I explained in the answer: "Of course, that line won't work in Python2". I gave example for both versions. For Python2 you do the second example: print html.text.encode('utf8'). The two version handle strings differently so having just one method for both would be complicated. You can test for a Python version with sys.version.
sys.version_info also gives information about the version with a better format (tuple). You can perform version comparisons with if sys.version_info < (3,0) #python2

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.