UnicodeDecodeError: 'ascii' codec can't encode character u'\u2019'

Question

I keep getting the below error and can't seem to get .encode('ascii', errors='ignore') to work.

eqs = soup.find_all('div', {'style': 'margin:7px 5px 0px;vertical-align:top;text-align:center;display:inline-block;line-height:normal;width:120px;'})

for equipment in eqs:
    if '#b0c3d9' in str(equipment):
        f2.write(equipment.getText() + ', Common\n')
    if '#5e98d9' in str(equipment):
        f2.write(equipment.getText() + ', Uncommon\n')
    if '#4b69ff' in str(equipment):
        f2.write(equipment.getText() + ', Rare\n')
    if '#8847ff' in str(equipment):
        f2.write(equipment.getText() + ', Mythical\n')
    if '#b28a33' in str(equipment):
        f2.write(equipment.getText() + ', Immortal\n')
    if '#d32ce6' in str(equipment):
        f2.write(equipment.getText() + ', Legendary\n')
    if '#eb4b4b' in str(equipment):
        f2.write(equipment.getText() + ', Ancient\n')
    if '#ade55c' in str(equipment):
        f2.write(equipment.getText() + ', Arcana\n')

I have tried:

f2.write(equipment.getText().encode('ascii', errors='ignore'))

and

f2.write(equipment.encode('ascii', errors='ignore').getText())

As well as some other things I am ashamed to post. Such as running it through the file that BeautifulSoup would later read from, but that just throws a different error. Thanks again for helping.

full traceback:

Traceback (most recent call last):
 File "<pyshell#285>", line 1, in <module>
  import D2soup1
 File "D2soup1.py", line 86, in <module>
  test()
 File "D2soup1.py", line 30, in test
  f2.write(equipment.getText() + ', Immortal\n')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 5:     ordinal not in range(128)

I am using string to parse out the box-shadow from the below html. I know it is probably not the best practice, but it was the only way I could think to grab it. Still new to BeautifulSoup.

<div style="margin:7px 5px 0px;vertical-align:top;text-align:center;display:inline-block;line-height:normal;width:120px;"><div style="margin-bottom: 5px;box-shadow:0px 0px 2px 4px #5e98d9;"><a href="/Pirate_Slayer%27s_Tricorn" title="Pirate Slayer's Tricorn"><img alt="Pirate Slayer's Tricorn" src="http://hydra-media.cursecdn.com/dota2.gamepedia.com/thumb/7/79/Pirate_Slayer%27s_Tricorn.png/120px-Pirate_Slayer%27s_Tricorn.png" width="120" height="80" srcset="http://hydra-media.cursecdn.com/dota2.gamepedia.com/thumb/7/79/Pirate_Slayer%27s_Tricorn.png/180px-Pirate_Slayer%27s_Tricorn.png 1.5x, http://hydra-media.cursecdn.com/dota2.gamepedia.com/thumb/7/79/Pirate_Slayer%27s_Tricorn.png/240px-Pirate_Slayer%27s_Tricorn.png 2x"></a></div>

What is the full traceback? Why are you using str(equipment) there? — Martijn Pieters
– Martijn Pieters, Commented Jan 31, 2014 at 19:39
What attribute are those colours in? Why not retrieve just the attribute? Can you share a sample HTML snippet? — Martijn Pieters
– Martijn Pieters, Commented Jan 31, 2014 at 19:41
Added to show requested info. I tried coming up with a workaround using regex, but wasn't having any luck. — Timmay
– Timmay, Commented Jan 31, 2014 at 19:51

Martijn Pieters · Accepted Answer · 2014-02-01 13:03:09Z

3

You are using str(equipment) without a codec; you are encoding the Tag object to ASCII.

Don't use str; get the text once as a unicode value. And use a mapping and a loop instead of so many if statements.

In this case, the style attribute is all you need to test against:

types = {
    '#b0c3d9': 'Common',
    '#5e98d9': 'Uncommon',
    '#4b69ff':'Rare',
    '#8847ff': 'Mythical',
    '#b28a33': 'Immortal',
    '#d32ce6': 'Legendary',
    '#eb4b4b': 'Ancient',
    '#ade55c': 'Arcana'
}

for equipment in eqs:
    style = equipment.div.attrs.get('style', '')
    textcontent = equipment.getText().encode('utf8')
    for key in types:
        if key in style:
            f2.write('{}, {}'.format(textcontent, types[key])

Most likely, however, those color codes are in an attribute on the equipment tag; look just in the tag value, or use a .find() call to narrow down your searches.

edited Feb 1, 2014 at 13:03

answered Jan 31, 2014 at 19:46

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Timmay Over a year ago

That is giving me a 'NoneType' object has no attribute 'get' for this line style = equipment.attr.get('style', ''). Tried to edit your code to add a parenthesis at the end, but didn't work.

Timmay Over a year ago

I added the first <div style> to the html code. not sure if that helps.

Martijn Pieters Over a year ago

@Timmay: ah, sorry, there was a typo; .attrs, not .attr.

Timmay Over a year ago

Still not working. I set the html to a variable in the Python Shell and ran it through BeautifulSoup. I then use the exact code provided to try to get the text and rarity, but when it finishes nothing is printed. Not sure if it's because I'm using bs4 or not. Should I update the question with the code?

Martijn Pieters Over a year ago

Yes, plus a bigger input sample.

|

Collectives™ on Stack Overflow

UnicodeDecodeError: 'ascii' codec can't encode character u'\u2019'

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related