1

I keep getting the below error and can't seem to get .encode('ascii', errors='ignore') to work.

eqs = soup.find_all('div', {'style': 'margin:7px 5px 0px;vertical-align:top;text-align:center;display:inline-block;line-height:normal;width:120px;'})

for equipment in eqs:
    if '#b0c3d9' in str(equipment):
        f2.write(equipment.getText() + ', Common\n')
    if '#5e98d9' in str(equipment):
        f2.write(equipment.getText() + ', Uncommon\n')
    if '#4b69ff' in str(equipment):
        f2.write(equipment.getText() + ', Rare\n')
    if '#8847ff' in str(equipment):
        f2.write(equipment.getText() + ', Mythical\n')
    if '#b28a33' in str(equipment):
        f2.write(equipment.getText() + ', Immortal\n')
    if '#d32ce6' in str(equipment):
        f2.write(equipment.getText() + ', Legendary\n')
    if '#eb4b4b' in str(equipment):
        f2.write(equipment.getText() + ', Ancient\n')
    if '#ade55c' in str(equipment):
        f2.write(equipment.getText() + ', Arcana\n')

I have tried:

f2.write(equipment.getText().encode('ascii', errors='ignore'))

and

f2.write(equipment.encode('ascii', errors='ignore').getText())

As well as some other things I am ashamed to post. Such as running it through the file that BeautifulSoup would later read from, but that just throws a different error. Thanks again for helping.

full traceback:

Traceback (most recent call last):
 File "<pyshell#285>", line 1, in <module>
  import D2soup1
 File "D2soup1.py", line 86, in <module>
  test()
 File "D2soup1.py", line 30, in test
  f2.write(equipment.getText() + ', Immortal\n')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 5:     ordinal not in range(128)

I am using string to parse out the box-shadow from the below html. I know it is probably not the best practice, but it was the only way I could think to grab it. Still new to BeautifulSoup.

<div style="margin:7px 5px 0px;vertical-align:top;text-align:center;display:inline-block;line-height:normal;width:120px;"><div style="margin-bottom: 5px;box-shadow:0px 0px 2px 4px #5e98d9;"><a href="/Pirate_Slayer%27s_Tricorn" title="Pirate Slayer's Tricorn"><img alt="Pirate Slayer's Tricorn" src="http://hydra-media.cursecdn.com/dota2.gamepedia.com/thumb/7/79/Pirate_Slayer%27s_Tricorn.png/120px-Pirate_Slayer%27s_Tricorn.png" width="120" height="80" srcset="http://hydra-media.cursecdn.com/dota2.gamepedia.com/thumb/7/79/Pirate_Slayer%27s_Tricorn.png/180px-Pirate_Slayer%27s_Tricorn.png 1.5x, http://hydra-media.cursecdn.com/dota2.gamepedia.com/thumb/7/79/Pirate_Slayer%27s_Tricorn.png/240px-Pirate_Slayer%27s_Tricorn.png 2x"></a></div>
3
  • What is the full traceback? Why are you using str(equipment) there? Commented Jan 31, 2014 at 19:39
  • 1
    What attribute are those colours in? Why not retrieve just the attribute? Can you share a sample HTML snippet? Commented Jan 31, 2014 at 19:41
  • Added to show requested info. I tried coming up with a workaround using regex, but wasn't having any luck. Commented Jan 31, 2014 at 19:51

1 Answer 1

3

You are using str(equipment) without a codec; you are encoding the Tag object to ASCII.

Don't use str; get the text once as a unicode value. And use a mapping and a loop instead of so many if statements.

In this case, the style attribute is all you need to test against:

types = {
    '#b0c3d9': 'Common',
    '#5e98d9': 'Uncommon',
    '#4b69ff':'Rare',
    '#8847ff': 'Mythical',
    '#b28a33': 'Immortal',
    '#d32ce6': 'Legendary',
    '#eb4b4b': 'Ancient',
    '#ade55c': 'Arcana'
}

for equipment in eqs:
    style = equipment.div.attrs.get('style', '')
    textcontent = equipment.getText().encode('utf8')
    for key in types:
        if key in style:
            f2.write('{}, {}'.format(textcontent, types[key])

Most likely, however, those color codes are in an attribute on the equipment tag; look just in the tag value, or use a .find() call to narrow down your searches.

Sign up to request clarification or add additional context in comments.

7 Comments

That is giving me a 'NoneType' object has no attribute 'get' for this line style = equipment.attr.get('style', ''). Tried to edit your code to add a parenthesis at the end, but didn't work.
I added the first <div style> to the html code. not sure if that helps.
@Timmay: ah, sorry, there was a typo; .attrs, not .attr.
Still not working. I set the html to a variable in the Python Shell and ran it through BeautifulSoup. I then use the exact code provided to try to get the text and rarity, but when it finishes nothing is printed. Not sure if it's because I'm using bs4 or not. Should I update the question with the code?
Yes, plus a bigger input sample.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.