0

Hello I have trouble with replacing all texts from HTML. I wanted to make a censure with BeautifulSoup but it doesn't replace a content and I got error when I print contents (not all texts from HTML were printed)

words = ['Shop','Car','Home','Generic','Elements']
page = urllib.urlopen("html1/index.html").read()
soup = BeautifulSoup(page, 'html.parser')
texts = soup.findAll(text=True)
for i in texts :
    if i == words :
       i = '***'
    print i

Anyone know how to fix it?

Error :

Traceback (most recent call last):
File "replacing.py", line 28, in <module>
print i
File "F:\Python\Python27\lib\encodings\cp852.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2019' in position 25: character maps to <undefined>
1
  • Would you mind including a small sample of text that wont print in your question? Commented May 7, 2018 at 17:38

2 Answers 2

2

You have two major issues here. The first is an encoding issue, where you are trying to print a non-printable character. For that you can use answers found in:

UnicodeEncodeError: 'charmap' codec can't encode - character maps to <undefined>, print function

Or, for a more in depth explanation:

Python, Unicode, and the Windows console (Now that I look at this more it's probably outdated, but still an interesting read).

However, you also have a logic problem with your code.

if i == words:

This line doesn't check if i is found in words, but instead compares i to a list of words, which isn't what you want. I would recommend making the following changes:

words = {'Shop','Car','Home','Generic','Elements'}

for i in texts:
    if i in words:
        i = '***'

Converting words to a set allows for average O(1) lookup, and using if i in words checks if i is found in words.

Sign up to request clarification or add additional context in comments.

3 Comments

I found that error in iteration and I fix it. I have one question about replacing a string with blank spaces, how is the correct way to replace a simple words when string has a words with blank spaces, for example : input : "My new car is good" output: "My new *** is good"
You could use split() to split into individual words and then replace using a list comprehension like this: repl.it/repls/PoorCorruptCarriers
Thank You for fix ;) Have a nice day!
0

It looks like one of the characters you are trying to print is not found in the codec python uses to print messages. I.e. you have the data for a character but you don't know what symbol it should be and so you can't print it. A simple conversion of the HTML to a unicode format should solve your problem.

Good question on how to do that:

Convert HTML entities to Unicode and vice versa

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.