2

There are similar answers but I could not apply it to my own case I wanna get rid of forbidden characters for Windows directory names in my pandas dataframe. I tried to use something like:

df1['item_name'] =  "".join(x for x in df1['item_name'].rstrip() if x.isalnum() or x in [" ", "-", "_"]) if df1['item_name'] else ""

Assume I have a dataframe like this

 item_name
0  st*back
1  yhh?\xx
2  adfg%s
3  ghytt&{23
4  ghh_h

I want to get:

   item_name
0  stback
1  yhhxx
2  adfgs
3  ghytt23
4  ghh_h

How I could achieve this? Note: I scraped data from internet earlier, and used the following code for the older version

item_name = "".join(x for x in item_name.text.rstrip() if x.isalnum() or x in [" ", "-", "_"]) if item_name else ""

Now, I have new observations for the same items and I want to merge them with older observations. But I forgot to use the same method when I rescraped

3
  • df.item_name = df.item_name.apply(lambda x: x.replace("\s|-|_", "") Commented Apr 17, 2017 at 21:00
  • no but I wanna keep "_" and "-" just I wanna get rid of the items that forbidden for Windows directory. Commented Apr 17, 2017 at 21:02
  • should have been re.sub anyway. Commented Apr 17, 2017 at 21:08

3 Answers 3

4

You could summarize the condition as a negative character class, and use str.replace to remove them, here \w stands for word characters alnum + _, \s stands for space and - is literal dash. With ^ in the character class, [^\w\s-] matches any character that is not alpha numeric, nor [" ", "-", "_"], then you can use replace method to remove them:

df.item_name.str.replace("[^\w\s-]", "")

#0     stback
#1      yhhxx
#2      adfgs
#3    ghytt23
#4      ghh_h
#Name: item_name, dtype: object
Sign up to request clarification or add additional context in comments.

2 Comments

Sorry I edited my question, it would achieve the same that I did before?
It should. as stated in the answer, this pattern removes characters that are not [a-zA-Z0-9, _, -, " "].
3

Try

import re
df.item_name.apply(lambda x: re.sub('\W+', '', x))

0     stback
1      yhhxx
2      adfgs
3    ghytt23
4      ghh_h

Comments

1

If you have a properly escaped list of characters

lst = ['\\\\', '\*', '\?', '%', '&', '\{']
df.replace(lst, '', regex=True)

  item_name
0    stback
1     yhhxx
2     adfgs
3   ghytt23
4     ghh_h

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.