How to replace a value anywhere in pandas dataframe based on condition?

Question

There are so many questions about replacing some rows or columns or particular values, but I haven't found what I am looking for. Imagine a dataframe like this,

          a         b         c         d
a  0.354511  0.416929  0.704512  0.598345
b  0.948605  0.473364  0.154856  0.637639
c  0.250829  0.130928  0.682998  0.056049
d  0.504516  0.880731  0.216192  0.314724

And now I would like to replace all values based on a condition with something else (no matter in which column or row they are). Let's say I want to replace all values < 0.5 with np.nan. I have tried several things and nothing worked (i.e. nothing happened, the dataframe remained unchanged).

Example code here:

frame = pd.DataFrame(np.random.rand(4,4),index=['a','b','c','d'], columns=['a','b','c','d'])
print frame
for row,col in enumerate(frame):
    frame.replace(frame.ix[row,col]<0.5,np.nan,inplace=True)
print frame

or

for row,col in enumerate(frame):
    if frame.ix[row,col]<=0.5:
        M.ix[row,col]=np.nan
print M

but in the end,

          a         b         c         d
a  0.600701  0.823570  0.159012  0.615898
b  0.234855  0.086080  0.950064  0.982248
c  0.440625  0.960078  0.191975  0.598865
d  0.127866  0.537867  0.434326  0.507635
          a         b         c         d
a  0.600701  0.823570  0.159012  0.615898
b  0.234855  0.086080  0.950064  0.982248
c  0.440625  0.960078  0.191975  0.598865
d  0.127866  0.537867  0.434326  0.507635

- they are identical, no NaNs instead of small values. Where is the problem?

piRSquared · Accepted Answer · 2017-03-21 23:21:46Z

The pandas methods that do this are where and mask

where keeps the dataframe values where the condition is True
The optional second argument is the value to replace with

frame.where(frame < .5, -9)

          a         b         c         d
a  0.354511  0.416929 -9.000000 -9.000000
b -9.000000  0.473364  0.154856 -9.000000
c  0.250829  0.130928 -9.000000  0.056049
d -9.000000 -9.000000  0.216192  0.314724

or the sister method

mask keeps the dataframe values where the condition is False
The optional second argument is the value to replace with

frame.mask(frame < .5, -9)

          a         b         c         d
a -9.000000 -9.000000  0.704512  0.598345
b  0.948605 -9.000000 -9.000000  0.637639
c -9.000000 -9.000000  0.682998 -9.000000
d  0.504516  0.880731 -9.000000 -9.000000

numpy.where
We can use numpy to very similar effect

pd.DataFrame(
    np.where(frame < .5, df, -9),
    frame.index, frame.columns)

          a         b         c         d
a  0.354511  0.416929 -9.000000 -9.000000
b -9.000000  0.473364  0.154856 -9.000000
c  0.250829  0.130928 -9.000000  0.056049
d -9.000000 -9.000000  0.216192  0.314724

naive time testing

durbachit · Accepted Answer · 2017-03-21 23:12:24Z

Ah, I see. I figured it out. Maybe not the most elegant solution, but it works. Element-wise operations are probably easier with numpy arrays, so I convert the frame to a numpy array, change the stuff and then turn it back into pandas dataframe. THAT simple:

frame = np.asarray(frame)
frame[frame<0.5] = np.nan
frame = pd.DataFrame(frame,index=['a','b','c','d'], columns=['a','b','c','d'])

This will return the desired output

          a         b         c         d
a  0.791982  0.654760  0.854503  0.552131
b  0.545564       NaN  0.966512       NaN
c  0.595927  0.540071  0.938315       NaN
d       NaN  0.844594       NaN       NaN

Sorry for spamming to early. But I will keep it here in case somebody has the same problem.

Collectives™ on Stack Overflow

How to replace a value anywhere in pandas dataframe based on condition?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related