2

The following dataframe df1 contains numerical values

   IDs          Value1      Value2        Value     Value4
   AB              1          1             1       5
   BC              2          2             2       3
   BG              1          1             4       1
   RF              2          2             2       7

and this dataframe df2 contains Boolean values:

   Index          0                1             2         3
   1              True           False          True       True
   2              False          False          True       False
   3              False          False          True       False
   4              False          False          False      False

with the same number of columns and rows.

What I need is to subset df1 in the following manner: get only the columns that in df2 have at least on True value.

Meaning the following:

   IDs          Value1         Value3     Value4
   AB              1              1       5
   BC              2              2       3
   BG              1              4       1
   RF              2              2       7

I have tried the following code:

df2_true = np.any(df2,axis=1)

However, the line above returns a list which can not be used here:

result = df1[:,df2_true]

Any help would be welcome

3 Answers 3

3

I think it will work

df1.loc[:,df2.any(0).values.tolist()]
Out[741]: 
     Value1  Value  Value4
IDs                       
AB        1      1       5
BC        2      2       3
BG        1      4       1
RF        2      2       7
Sign up to request clarification or add additional context in comments.

Comments

3

Use loc with np.any per index (axis=0):

result = df1.loc[:, np.any(df2.values,axis=0)]
print (result)
     Value1  Value  Value4
IDs                       
AB        1      1       5
BC        2      2       3
BG        1      4       1
RF        2      2       7

3 Comments

it should be result = df1.loc[:, np.any(df2.values,axis=1)] right?
@user37143 - No, need axis=0
@user37143 - it working here because same number of columns and rows, try add one row and axis=1 not working.
1

Your already in the right direction, however since your interested in masking the columns you just need to apply the np.any() operation on the other axis and then apply your boolean mask to the columns attribute of the original dataframe:

masked_df = df1.columns[df2.any(axis=0)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.