7

I have a DataFrame:

import pandas as pd
import numpy as np
x = {'Value': ['Test', 'XXX123', 'XXX456', 'Test']}
df = pd.DataFrame(x)

I want to replace the values starting with XXX with np.nan using lambda.

I have tried many things with replace, apply and map and the best I have been able to do is False, True, True, False.

The below works, but I would like to know a better way to do it and I think the apply, replace and a lambda is probably a better way to do it.

df.Value.loc[df.Value.str.startswith('XXX', na=False)] = np.nan
3
  • does your dataframe has just the 1 column? and apply isnt a preffered way bdw Commented Aug 22, 2019 at 17:32
  • dataframe has many columns Commented Aug 22, 2019 at 17:49
  • and each column has values starting with XXX which you want to replace with np.nan or is it just 1 column? Commented Aug 22, 2019 at 17:51

3 Answers 3

16

use the apply method

In [80]: x = {'Value': ['Test', 'XXX123', 'XXX456', 'Test']}
In [81]: df = pd.DataFrame(x)
In [82]: df.Value.apply(lambda x: np.nan if x.startswith('XXX') else x)
Out[82]:
0    Test
1     NaN
2     NaN
3    Test
Name: Value, dtype: object

Performance Comparision of apply, where, loc enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Excellent. This answer helps me understand lambda better for this sort of thing.
5

np.where() performs way better here:

df.Value=np.where(df.Value.str.startswith('XXX'),np.nan,df.Value)

Performance vs apply on larger dfs:

enter image description here

2 Comments

I like the np.where option you presented. How does the apply lambda test against it?
@McRae check this
1

Use of .loc is not necessary. Write just:

df.Value[df.Value.str.startswith('XXX')] = np.nan

Lambda function could be necessary if you wanted to compute some expression to be substituted. In this case just np.nan is enough.

2 Comments

Thanks very much for your answer. It looks like I kind of fell on the right path anyway??
I thouhgt actually about applying a lambda function, which returns some value to be substituted. In this case the value to substitute is just np.nan, so there is no need to apply any lambda function.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.