Replacing a string from an built-in unknown character in python

Question

I'm trying to replace this unrecognized � text below here to a another text. What I did is decode it to UTF-8 and try to replace the unknown text below.

" you�"

varlistlabela = spss.GetVariableLabel(var)
varlistlabela=varlistlabela.decode("cp1252").replace(r'[\u0020-\ud7ff]',"").encode("cp1252")

Regular str.replace doesn't understand character ranges like [\u0020-\ud7ff]. Were you thinking of re.sub? — Kevin
– Kevin, Commented Apr 18, 2019 at 2:43

Austin · Accepted Answer · 2019-04-18 02:47:24Z

1

You need a regex substitution:

re.sub(r'[^\u0020-\ud7ff]', '', s)

where s is the input string.

Code:

import re

s = " you�"
print(re.sub(r'[^\u0020-\ud7ff]', '', s))
#  you

answered Apr 18, 2019 at 2:47

Austin

26.1k4 gold badges28 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Brad Solomon Over a year ago

Just curious, is there a range of code points that represents the "unknown" characters? I see that chr(65533) and chr(65534) are two different things, while chr(65535) seems to be a repeat of the latter

Collectives™ on Stack Overflow

Replacing a string from an built-in unknown character in python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related