Boolean Dataframe filter for another Dataframe

Question

The following dataframe df1 contains numerical values

   IDs          Value1      Value2        Value     Value4
   AB              1          1             1       5
   BC              2          2             2       3
   BG              1          1             4       1
   RF              2          2             2       7

and this dataframe df2 contains Boolean values:

   Index          0                1             2         3
   1              True           False          True       True
   2              False          False          True       False
   3              False          False          True       False
   4              False          False          False      False

with the same number of columns and rows.

What I need is to subset df1 in the following manner: get only the columns that in df2 have at least on True value.

Meaning the following:

   IDs          Value1         Value3     Value4
   AB              1              1       5
   BC              2              2       3
   BG              1              4       1
   RF              2              2       7

I have tried the following code:

df2_true = np.any(df2,axis=1)

However, the line above returns a list which can not be used here:

result = df1[:,df2_true]

Any help would be welcome

BENY · Accepted Answer · 2018-09-20 13:59:03Z

3

I think it will work

df1.loc[:,df2.any(0).values.tolist()]
Out[741]: 
     Value1  Value  Value4
IDs                       
AB        1      1       5
BC        2      2       3
BG        1      4       1
RF        2      2       7

answered Sep 20, 2018 at 13:59

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2018-09-20 13:58:14Z

3

Use loc with np.any per index (axis=0):

result = df1.loc[:, np.any(df2.values,axis=0)]
print (result)
     Value1  Value  Value4
IDs                       
AB        1      1       5
BC        2      2       3
BG        1      4       1
RF        2      2       7

answered Sep 20, 2018 at 13:58

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

3 Comments

user37143 Over a year ago

it should be result = df1.loc[:, np.any(df2.values,axis=1)] right?

jezrael Over a year ago

@user37143 - No, need axis=0

jezrael Over a year ago

@user37143 - it working here because same number of columns and rows, try add one row and axis=1 not working.

mxkrn · Accepted Answer · 2018-09-20 14:03:28Z

1

Your already in the right direction, however since your interested in masking the columns you just need to apply the np.any() operation on the other axis and then apply your boolean mask to the columns attribute of the original dataframe:

masked_df = df1.columns[df2.any(axis=0)]

answered Sep 20, 2018 at 14:03

mxkrn

1682 silver badges12 bronze badges

Collectives™ on Stack Overflow

Boolean Dataframe filter for another Dataframe

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related