2

Suppose I have the following dataframe:

          a         b         c         d 
0  0.049531  0.408824  0.975756  0.658347
1  0.981644  0.520834  0.258911  0.639664
2  0.641042  0.534873  0.806442  0.066625
3  0.764057  0.063252  0.256748  0.045850

and I want only the subset of columns whose value in row 0 is creater than 0.5. I can do this:

df2 = df.T
myResult = df2[df2.iloc[:, 0] > 0.5].T

But this feels like a horrible hack. Is there a nicer way to do boolean indexing along columns? Somewhere I can specify an axis argument?

1
  • I believe you've got the most elegant way out there. Commented Aug 12, 2014 at 20:11

3 Answers 3

7

How about this?

df.loc[:, df.iloc[0, :] > 0.5]
Sign up to request clarification or add additional context in comments.

1 Comment

Yes, that is precisely what I was looking for.
1

Another method without using transpose is to create a boolean mask on whether the first row has values larger than 0.5 and then drop the NaN's with a threshold and then finally make a list of the df columns to filter the original df. This is pretty obfuscated though ;)

In [76]:

df[list(df[df.head(1)> 0.5].dropna(thresh=1, axis=1))]
Out[76]:
              c         d
index                    
0      0.975756  0.658347
1      0.258911  0.639664
2      0.806442  0.066625
3      0.256748  0.045850

Comments

0

Another way of looking at your answer:

In [14]: df.T[df.T[0] > 0.5].T
Out[14]: 
          c        d 
0  0.975756  0.658347
1  0.258911  0.639664
2  0.806442  0.066625
3  0.256748  0.045850

1 Comment

Triple transpose might not be as elegant as your answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.