Using numpy where with applymap in pandas

Question

I am trying to use numpy where in conjunction with applymap in pandas.

Sample DF:

f = [[1,5],[20,40],[100,21],[15,19],[-46,101]]
test = pd.DataFrame(f,columns=["A","B"])
test

OP:

    A   B
0   1   5
1   20  40
2   100 21
3   15  19
4   -46 101

Condition is, if a column value is greater than 50 or less than 25 it should be changed to 0 or it should remain as it is.

Code:

test = test.applymap(lambda x:np.where((test[x]>50)| (test[x]<25), 0,test[x]) )
test

Error:

    KeyError                                  Traceback (most recent call last)
~\AppData\Local\Continuum\miniconda\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2896             try:
-> 2897                 return self._engine.get_loc(key)
   2898             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: (1, 'occurred at index A')

Any suggestions will be helpful

ansev · Accepted Answer · 2020-02-10 11:06:48Z

3

Use DataFrame.mask:

test.mask(test.lt(25)|test.gt(50),0)

or DataFrame.where

test.where(test.ge(25) & test.le(50),0)

Output

Using DataFrame.applymap we could do:

test.applymap(lambda x: 0 if (x>50) or (x<25) else x)

but this could become slow for large data frames

Solution with np.where

import numpy as np
pd.DataFrame(np.where((test<25)|(test>50),0,test),index = test.index,columns = test.columns)

EDIT

mean_test = test.mean()
limit = 5
df_filtered = test.mask(test.gt(mean_test.add(limit))|
                        test.lt(mean_test.sub(limit)),0)
print(df_filtered)
    A   B
0   0   0
1  20  40
2   0   0
3  15   0
4   0   0

edited Feb 10, 2020 at 11:06

answered Feb 10, 2020 at 10:43

ansev

31k5 gold badges21 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

data_person Over a year ago

this is great! I have 1 more doubt, what if I want to calculate the greater than number (50 in this case) and the lesser than number (25 in this case) for each column say through mean, which could be the better option?

ansev Over a year ago

What would be the expected output in that case?

ansev Over a year ago

Do you want to calculate the average of those over 50 and those under 25 for each column?

data_person Over a year ago

The condition is still the same. But instead of hardcoding the upper limit as 50 and lower limit as 25 for each column, I want to set the upper limit as mean(column) + 5 and lower limit as mean(column) - 5

ansev Over a year ago

you may find it useful to know that you can calculate the variance and the standard deviation : pandas.pydata.org/pandas-docs/stable/reference/api/… , pandas.pydata.org/pandas-docs/stable/reference/api/… ,for example: mean_test.add(test.std().mul(3**1/2))

|

data_person · Accepted Answer · 2020-04-17 02:52:20Z

sample_df = pd.DataFrame(np.random.randint(1,20,size=(10, 2)), columns=list('BC'))
sample_df["date"]= ["2020-02-01","2020-02-01","2020-02-01","2020-02-01","2020-02-01",
                    "2020-02-02","2020-02-02","2020-02-02","2020-02-02","2020-02-02"]
sample_df["date"] = pd.to_datetime(sample_df["date"])
sample_df.set_index(sample_df["date"],inplace=True)
sample_df["A"]=[10,10,10,10,10,12,1,3,4,2]
del sample_df["date"]
sample_df


def func(df,n_bins):
    try:
        proc_col = pd.qcut(df["A"].values, n_bins, labels=range(0,n_bins))
        return proc_col
    except:
        proc_col = pd.qcut(df.mean(axis =1).values, n_bins, labels=range(0,n_bins))
        return proc_col

sample_df["A"]=sample_df.groupby([sample_df.index.get_level_values(0)])[["C","A"]].apply(lambda df: func(df,3))
sample_df


B   C   A
date            
2020-02-01  1   16  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01  5   19  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01  2   16  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01  12  11  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01  15  10  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  19  17  [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  17  7   [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  14  1   [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  19  13  [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  15  13  [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 .

Collectives™ on Stack Overflow

Using numpy where with applymap in pandas

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related