1

Below is a sample of my df

id  name 
01  1
02  23 2    
03  234     
04  23423   
05  24 H AUTOSERVICE    
06  25 SUNGLASS

The aim is to 'clean' the DF by replacing digits with NaN only if the whole value contains digits.

The expected output would look like this

id  name 
01  NaN
02  24 H AUTOSERVICE    
03  25 SUNGLASS

I was thinking about something like this. Besides, it would remove all digits even 24 H

 df['name'] = df['name'].replace(r'[0-9]', '')

Thanks for anyone helping!

1 Answer 1

2

First step is with Series.str.contains with negative selection [] of numbers and also whitespace \s and Series.where:

df['name'] = df['name'].where(df['name'].str.contains('[^0-9\s]'))
print (df)
   id              name
0   1               NaN
1   2               NaN
2   3               NaN
3   4               NaN
4   5  24 H AUTOSERVICE
5   6       25 SUNGLASS

For remove consecutive NaNs:

m = df['name'].isna()
df = df[m.ne(m.shift()) | ~m]
print (df)
   id              name
0  01               NaN
4  05  24 H AUTOSERVICE
5  06       25 SUNGLASS
Sign up to request clarification or add additional context in comments.

1 Comment

Using the Mask and isnumeric doesnt solve this issue when the value is : 123 456?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.