2

I have a pandas dataframe as below with 3 columns. I want to compare each column to see if the value matches a particular string, and if yes, replace the value with NaN.

For example, if there are 5 values in column 1 of the data frame:

abcd
abcd
defg
abcd
defg

and if the comparison string is defg, the end result for column 1 in the data frame should be.

abcd
abcd
NaN
abcd
NaN
1
  • @ pseudocode425, if any of the solution worked for you and best fit then accept that as an answer ! Commented Dec 17, 2018 at 17:39

4 Answers 4

4

Use pandas in-built solution Using replace method as a regex and inplace method to make it permanent in the dataframe, while use numpy to replace the matching values to NaN.

import pandas as pd
import numpy as np

Example DataFrame:

df
   col1
0  abcd
1  abcd
2  defg
3  abcd
4  defg

Result:

df['col1'].replace(['defg'], np.nan, regex=True, inplace=True)
   df
       col1
    0  abcd
    1  abcd
    2   NaN
    3  abcd
    4   NaN
Sign up to request clarification or add additional context in comments.

Comments

2

There's a bunch of solutions... If you want to practice with using lambda functions you could always do...

df['Col1'] = df.Col1.apply(lambda x: np.nan if x == 'defg' else x)

Result:

0  abcd
1  abcd
2   NaN
3  abcd
4   NaN
Seconds:  0.0020899999999999253

Processing time is probably a little bit slower than the solutions above though after some unit testing.

Comments

1

You can use numpy where to set values based on boolean conditions:

import numpy as np
df["col_name"] = np.where(df["col_name"]=="defg", np.nan, df["col_name"])

Obviously replace col_name with whatever your actual column name is.

An alternative is to use pandas .loc to change the values in the DataFrame in place:

df.loc[df["col_name"]=="defg", "col_name"] = np.nan

1 Comment

I get a TypeError: invalid type comparison with this, it's comparing a Series to a Str?
1

You can use mask, this will replace 'defg' in the entire dataframe with NaN:

df.mask(df == 'defg')

Output:

      0
0  abcd
1  abcd
2   NaN
3  abcd
4   NaN

You can do this for a column also:

df['col1'].mask(df['col1'] == 'defg')

Or using replace as @pygo suggest in his solution

df['col1'].replace('defg',np.nan)

3 Comments

@Scott Boston - how does this work if I only wanted to do the replace in a particular column and not the entire data set?
@pseudocode425, try the alternative answer as i provided, i have illustrated that creating a column col1 as an example.
However with Scott's answer , simply try like df.col1.mask(df == 'defg') that will give you what you are asking.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.