5

There are so many questions about replacing some rows or columns or particular values, but I haven't found what I am looking for. Imagine a dataframe like this,

          a         b         c         d
a  0.354511  0.416929  0.704512  0.598345
b  0.948605  0.473364  0.154856  0.637639
c  0.250829  0.130928  0.682998  0.056049
d  0.504516  0.880731  0.216192  0.314724

And now I would like to replace all values based on a condition with something else (no matter in which column or row they are). Let's say I want to replace all values < 0.5 with np.nan. I have tried several things and nothing worked (i.e. nothing happened, the dataframe remained unchanged).

Example code here:

frame = pd.DataFrame(np.random.rand(4,4),index=['a','b','c','d'], columns=['a','b','c','d'])
print frame
for row,col in enumerate(frame):
    frame.replace(frame.ix[row,col]<0.5,np.nan,inplace=True)
print frame

or

for row,col in enumerate(frame):
    if frame.ix[row,col]<=0.5:
        M.ix[row,col]=np.nan
print M

but in the end,

          a         b         c         d
a  0.600701  0.823570  0.159012  0.615898
b  0.234855  0.086080  0.950064  0.982248
c  0.440625  0.960078  0.191975  0.598865
d  0.127866  0.537867  0.434326  0.507635
          a         b         c         d
a  0.600701  0.823570  0.159012  0.615898
b  0.234855  0.086080  0.950064  0.982248
c  0.440625  0.960078  0.191975  0.598865
d  0.127866  0.537867  0.434326  0.507635

- they are identical, no NaNs instead of small values. Where is the problem?

2 Answers 2

13

The pandas methods that do this are where and mask

where keeps the dataframe values where the condition is True
The optional second argument is the value to replace with

frame.where(frame < .5, -9)

          a         b         c         d
a  0.354511  0.416929 -9.000000 -9.000000
b -9.000000  0.473364  0.154856 -9.000000
c  0.250829  0.130928 -9.000000  0.056049
d -9.000000 -9.000000  0.216192  0.314724

or the sister method

mask keeps the dataframe values where the condition is False
The optional second argument is the value to replace with

frame.mask(frame < .5, -9)

          a         b         c         d
a -9.000000 -9.000000  0.704512  0.598345
b  0.948605 -9.000000 -9.000000  0.637639
c -9.000000 -9.000000  0.682998 -9.000000
d  0.504516  0.880731 -9.000000 -9.000000

numpy.where
We can use numpy to very similar effect

pd.DataFrame(
    np.where(frame < .5, df, -9),
    frame.index, frame.columns)

          a         b         c         d
a  0.354511  0.416929 -9.000000 -9.000000
b -9.000000  0.473364  0.154856 -9.000000
c  0.250829  0.130928 -9.000000  0.056049
d -9.000000 -9.000000  0.216192  0.314724

naive time testing

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

1

Ah, I see. I figured it out. Maybe not the most elegant solution, but it works. Element-wise operations are probably easier with numpy arrays, so I convert the frame to a numpy array, change the stuff and then turn it back into pandas dataframe. THAT simple:

frame = np.asarray(frame)
frame[frame<0.5] = np.nan
frame = pd.DataFrame(frame,index=['a','b','c','d'], columns=['a','b','c','d'])

This will return the desired output

          a         b         c         d
a  0.791982  0.654760  0.854503  0.552131
b  0.545564       NaN  0.966512       NaN
c  0.595927  0.540071  0.938315       NaN
d       NaN  0.844594       NaN       NaN

Sorry for spamming to early. But I will keep it here in case somebody has the same problem.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.