-2

I have string like below Which contains non ascii characters and other special characters:

 “Projected Set-tled Balan&ce†456$

How to remove all those unwanted characters and get a clean string like below which only has only small or capital alphabets and numbers.

  Project Settled Balance 456

I'm trying to achieve it with the help of regex [a-zA-Z0-9 ] I'm expecting a way to return string which matches this regex:

pat = re.compile('^[A-Za-z0-9 ]+')
stripped_string = string.strip().lower()
print(stripped_string)
print(pat.match(stripped_string))

But this is not returning anything.

2
  • You can check whether each character in the string is alphanumeric with isalnum(). Commented Mar 28, 2019 at 19:58
  • stackoverflow.com/questions/20078816/… Commented Mar 28, 2019 at 20:04

2 Answers 2

1

This is not regex as you haven't asked it for before

''.join([i if ((i == " " )or (ord(i) < 128 and ord(i) >46)) else '' for i in '“Projected Set-tled Balan&ce†456$'])

Updated for regex

re.sub(r'[^A-Za-z0-9\s]+','', '“Projected Set-tled Balan&ce†456$')
Sign up to request clarification or add additional context in comments.

Comments

0

aString.encode('ascii', 'ignore')

My bad, that was pretty dumb of me

Do that but one letter at a time and if you get a error, replace that char with an empty string.

This was asked a lot, but here's these.

How to remove nonAscii characters in python

Replace non-ASCII characters with a single space

1 Comment

This is not providing the desired output.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.