3

I have the following dataframe:

df=pd.DataFrame({'ssn':[12345,54321,111,47895,222311],'Name':['john','mike','adam','doug','liz']})

The DataFrame contains a 'ssn' that is supposed to only contain 5 digits. I want to replace all the rows that contain less than or greater than 5 digits with blank spaces.

The desired output is as below:

   Name   ssn
0  john   12345
1  mike   54321
2  adam   
3  doug   47895
4  liz    

I referred to the following post from SO replace string if length is less than x However, on using the same solution with following commands gives me an error:

mask = df['ssn'].str.len() == 5
df['ssn'] = df['ssn'].mask(mask, df['ssn'].str.replace(df['ssn'], ''))
Traceback (most recent call last): 
TypeError: 'Series' objects are mutable, thus they cannot be hashed

I would appreciate any suggestions.

2
  • 1
    df.assign(ssn=df.ssn.where(df.ssn.astype(str).str.len().eq(5), '')). You aren't dealing with strings, so I'm surprised you're getting that error, since it should error out when you try to use the string accessor Commented May 14, 2019 at 21:01
  • It works great! How do I accept your answer? Commented May 14, 2019 at 21:02

2 Answers 2

2

You can do this with df.apply as well: df['ssn'] = df['ssn'].apply(lambda a: a if len(str(a))==5 else '').

Sign up to request clarification or add additional context in comments.

Comments

1

Your column ssn contains numbers not string, that is why it is not working. Try the following :

mask = df['ssn'].astype(str).str.len() != 5
df.loc[mask, 'ssn'] = ''

In [1] : print(df)
Out[1] :    Name    ssn
0  john  12345
1  mike  54321
2  adam       
3  doug  47895
4   liz      

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.