0
import bs4 as bs
import urllib.request
import re
import os
from colorama import Fore, Back, Style, init

init()

def highlight(word):
    if word in keywords:
      return Fore.RED + str(word) + Fore.RESET
    else:
      return str(word)

for newurl in newurls:
 url = urllib.request.urlopen(newurl)
 soup1 = bs.BeautifulSoup(url, 'lxml')
 paragraphs =soup1.findAll('p')
 print (Fore.GREEN + soup1.h2.text + Fore.RESET)
 print('')
 for paragraph in paragraphs:
    if paragraph != None:
        textpara = paragraph.text.strip().split(' ')
        colored_words = list(map(highlight, textpara))
        print(" ".join(colored_words).encode("utf-8")) #encode("utf-8")
    else:
        pass

I will have list of key words and urls to go through. After running few keywords in a url, I get output like this

b'\x1b[31mthe desired \x1b[31mmystery corners \x1b[31mthe differential . 
\x1b[31mthe back \x1b[31mpretends to be \x1b[31mthe'

I removed encode("utf-8") and I get encoding error

Traceback (most recent call last):
 File "C:\Users\resea\Desktop\Python Projects\Try 3.py", line 52, in 
 <module>
   print(" ".join(colored_words)) #encode("utf-8")
  File "C:\Python34\lib\site-packages\colorama\ansitowin32.py", line 41, in 
   write
  self.__convertor.write(text)
   File "C:\Python34\lib\site-packages\colorama\ansitowin32.py", line 162, 
   in write
    self.write_and_convert(text)
   File "C:\Python34\lib\site-packages\colorama\ansitowin32.py", line 190, 
  in write_and_convert
  self.write_plain_text(text, cursor, len(text))
  File "C:\Python34\lib\site-packages\colorama\ansitowin32.py", line 195, in 
   write_plain_text
  self.wrapped.write(text[start:end])
   File "C:\Python34\lib\encodings\cp850.py", line 19, in encode
   return codecs.charmap_encode(input,self.errors,encoding_map)[0]
   UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in 
   position 23: character maps to <undefined>

Where am I going wrong?

3
  • Change the encoding, don’t remove it! Commented Jan 15, 2019 at 15:20
  • what are the other encoding i can use other than encode("utf-8") Commented Jan 15, 2019 at 16:09
  • A lot exist. See docs.python.org/3/howto/unicode.html Commented Jan 15, 2019 at 16:16

1 Answer 1

0

I know what I'm going to suggest is more of a workaround than a "solution" but I've been frustrated, again and again, by all sorts of strange characters that had to be dealt with "encode this" or "encode that", sometimes successfully and many times not.

Depending on the type of text used in your newurl, the universe of problematic characters is probably limited. So I deal with them on a case-by-case basis: Every time i get one of these errors, I do this:

import unicodedata
unicodedata.name('\u2019')

In your case, you'll get this:

'RIGHT SINGLE QUOTATION MARK'

The old, pesky, right single quotation mark... So next, as suggested here, I just replace that pesky character with another that looks like it, but does not raise the error; in your case

colored_words = list(map(highlight, textpara)).replace(u"\u2019", "'") # or some other replacement character

should work. And you rinse and repeat every time this error pops up. Admittedly, not the most elegant solution, but after a while, all possible strange characters in your newurl are captured and the errors stop.

Sign up to request clarification or add additional context in comments.

6 Comments

...or just find the right encoding in the first place and all of your problems will be solved.
I tried to, until one day I ran into this little guy which, for some reason, .encode("utf-8") couldn't handle, and I just gave up....
However, another code would work there. I don’t know enough about codes to be fully sure, but try the encoding utf-16 or utf-32. utf-64 may also exist, so try that too.
Thanks, will do. BTW, is it possible to apply multiple encodings simultaneously?
No. Well, you could encode multiple times, but that would be pointless and bad. Multiple encodings is not done, ever. BTW, did it work?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.