21

I have a DF column which has many strings in it. I need to remove all non-alpha numeric characters from that column: i.e:

df['strings'] = ["a#bc1!","a(b$c"]

Run code:

Print(df['strings']): ['abc','abc']

I've tried:

df['strings'].replace([',','.','/','"',':',';','!','@','#','$','%',"'","*","(",")","&",],"")

But this didn't work and I feel that there should be a more efficient way to do this using regex. Any help would be very appreciated.

0

4 Answers 4

35

Use str.replace.

df
  strings
0  a#bc1!
1   a(b$c

df.strings.str.replace('[^a-zA-Z]', '')
0    abc
1    abc
Name: strings, dtype: object

To retain alphanumeric characters (not just alphabets as your expected output suggests), you'll need:

df.strings.str.replace('\W', '')
0    abc1
1     abc
Name: strings, dtype: object 
Sign up to request clarification or add additional context in comments.

1 Comment

[^0-9a-zA-Z.,-/ ] was what i was after personally
13

Since you wrote alphanumeric, you need to add 0-9 in the regex. But maybe you only wanted alphabetic...

import pandas as pd

ded = pd.DataFrame({'strings': ['a#bc1!', 'a(b$c']})

ded.strings.str.replace('[^a-zA-Z0-9]', '')

But it's basically what COLDSPEED wrote

1 Comment

That is correct, I had to add in the 0-9 and also the spaces since I wanted that but coldspeed's answer was first and is the correct method.
1

You can also use regex

import re

regex = re.compile('[^a-zA-Z]')

l = ["a#bc1!","a(b$c"]

print [regex.sub('', i) for i in l]

['abc', 'abc']

Comments

1

There has been an update with the Pandas str.replace() method since the final top answer was written in 2017, now you need to specify that the pattern that you are passing is a regular expression (regex) as now the default parameter boolean value is False, so the updated code from the answer above should be:

df
  strings
0  a#bc1!
1   a(b$c

df.strings.str.replace('[^a-zA-Z]', '', regex=True) 

#the regex parameter must be set to True for this to work post 2017. 

# Additionally the regex parameter [^a-zA-Z] is saying to the computer: 
# Match a single character not present in the list below, which is a-z and A-Z.

0    abc
1    abc
Name: strings, dtype: object

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.