Python replace text

Question

Hello I have trouble with replacing all texts from HTML. I wanted to make a censure with BeautifulSoup but it doesn't replace a content and I got error when I print contents (not all texts from HTML were printed)

words = ['Shop','Car','Home','Generic','Elements']
page = urllib.urlopen("html1/index.html").read()
soup = BeautifulSoup(page, 'html.parser')
texts = soup.findAll(text=True)
for i in texts :
    if i == words :
       i = '***'
    print i

Anyone know how to fix it?

Error :

Traceback (most recent call last):
File "replacing.py", line 28, in <module>
print i
File "F:\Python\Python27\lib\encodings\cp852.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2019' in position 25: character maps to <undefined>

Would you mind including a small sample of text that wont print in your question? — user3483203
– user3483203, Commented May 7, 2018 at 17:38

user3483203 · Accepted Answer · 2018-05-07 17:41:32Z

2

You have two major issues here. The first is an encoding issue, where you are trying to print a non-printable character. For that you can use answers found in:

UnicodeEncodeError: 'charmap' codec can't encode - character maps to <undefined>, print function

Or, for a more in depth explanation:

Python, Unicode, and the Windows console (Now that I look at this more it's probably outdated, but still an interesting read).

However, you also have a logic problem with your code.

if i == words:

This line doesn't check if i is found in words, but instead compares i to a list of words, which isn't what you want. I would recommend making the following changes:

words = {'Shop','Car','Home','Generic','Elements'}

for i in texts:
    if i in words:
        i = '***'

Converting words to a set allows for average O(1) lookup, and using if i in words checks if i is found in words.

edited May 7, 2018 at 17:41

answered May 7, 2018 at 17:34

user3483203

51.3k10 gold badges72 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

pazucj Over a year ago

I found that error in iteration and I fix it. I have one question about replacing a string with blank spaces, how is the correct way to replace a simple words when string has a words with blank spaces, for example : input : "My new car is good" output: "My new *** is good"

user3483203 Over a year ago

You could use split() to split into individual words and then replace using a list comprehension like this: repl.it/repls/PoorCorruptCarriers

pazucj Over a year ago

Thank You for fix ;) Have a nice day!

Phil Dwan · Accepted Answer · 2018-05-07 17:29:45Z

0

It looks like one of the characters you are trying to print is not found in the codec python uses to print messages. I.e. you have the data for a character but you don't know what symbol it should be and so you can't print it. A simple conversion of the HTML to a unicode format should solve your problem.

Good question on how to do that:

Convert HTML entities to Unicode and vice versa

answered May 7, 2018 at 17:29

Phil Dwan

911 silver badge10 bronze badges

Collectives™ on Stack Overflow

Python replace text

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related