Python - remove unwanted characters from a string

Question

I have string like below Which contains non ascii characters and other special characters:

 â€œProjected Set-tled Balan&ceâ€ 456$

How to remove all those unwanted characters and get a clean string like below which only has only small or capital alphabets and numbers.

  Project Settled Balance 456

I'm trying to achieve it with the help of regex [a-zA-Z0-9 ] I'm expecting a way to return string which matches this regex:

pat = re.compile('^[A-Za-z0-9 ]+')
stripped_string = string.strip().lower()
print(stripped_string)
print(pat.match(stripped_string))

But this is not returning anything.

You can check whether each character in the string is alphanumeric with isalnum(). — Ardweaden
– Ardweaden, Commented Mar 28, 2019 at 19:58

Kartikeya Sharma · Accepted Answer · 2019-03-28 20:12:20Z

1

This is not regex as you haven't asked it for before

''.join([i if ((i == " " )or (ord(i) < 128 and ord(i) >46)) else '' for i in 'â€œProjected Set-tled Balan&ceâ€ 456$'])

Updated for regex

re.sub(r'[^A-Za-z0-9\s]+','', 'â€œProjected Set-tled Balan&ceâ€ 456$')

answered Mar 28, 2019 at 20:00

Kartikeya Sharma

1,3832 gold badges11 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user3667542 · Accepted Answer · 2019-03-28 20:28:30Z

0

aString.encode('ascii', 'ignore')

My bad, that was pretty dumb of me

Do that but one letter at a time and if you get a error, replace that char with an empty string.

This was asked a lot, but here's these.

answered Mar 28, 2019 at 20:12

user3667542

112 bronze badges

This is not providing the desired output.