Selecting columns from a pandas dataframe based on row conditions

Question

I have a pandas dataframe

In [1]: df = DataFrame(np.random.randn(10, 4))

Is there a way I can only select columns which have (last row) value>0 the desired output would be a new dataframe having all rows associated with columns where the last row >0

roman · Accepted Answer · 2015-04-03 15:14:07Z

In [201]: df = pd.DataFrame(np.random.randn(10, 4))

In [202]: df
Out[202]: 
          0         1         2         3
0 -1.380064  0.391358 -0.043390 -1.970113
1 -0.612594 -0.890354 -0.349894 -0.848067
2  1.178626  1.798316  0.691760  0.736255
3 -0.909491  0.429237  0.766065 -0.605075
4 -1.214366  1.907580 -0.583695  0.192488
5 -0.283786 -1.315771  0.046579 -0.777228
6  1.195634 -0.259040 -0.432147  1.196420
7 -2.346814  1.251494  0.261687  0.400886
8  0.845000  0.536683 -2.628224 -0.238449
9  0.246398 -0.548448 -0.295481  0.076117

In [203]: df.iloc[:, (df.iloc[-1] > 0).values]
Out[203]: 
          0         3
0 -1.380064 -1.970113
1 -0.612594 -0.848067
2  1.178626  0.736255
3 -0.909491 -0.605075
4 -1.214366  0.192488
5 -0.283786 -0.777228
6  1.195634  1.196420
7 -2.346814  0.400886
8  0.845000 -0.238449
9  0.246398  0.076117

Basically this solution uses very basic Pandas indexing, in particular iloc() method

EdChum · Accepted Answer · 2015-04-03 15:09:41Z

You can use the boolean series generated from the condition to index the columns of interest:

In [30]:

df = pd.DataFrame(np.random.randn(10, 4))
df
Out[30]:
          0         1         2         3
0 -0.667736 -0.744761  0.401677 -1.286372
1  1.098134 -1.327454  1.409357 -0.180265
2 -0.105780  0.446195 -0.562578 -0.746083
3  1.366714 -0.685103  0.982354  1.928026
4  0.091040 -0.689676  0.425042  0.723466
5  0.798305 -1.454922 -0.017695  0.515961
6 -0.786693  1.496968 -0.112125 -1.303714
7 -0.211216 -1.321854 -0.892023 -0.583492
8  1.293255  0.936271  1.873870  0.790086
9 -0.699665 -0.953611  0.139986 -0.200499
In [32]:

df[df.columns[df.iloc[-1]>0]]
Out[32]:
          2
0  0.401677
1  1.409357
2 -0.562578
3  0.982354
4  0.425042
5 -0.017695
6 -0.112125
7 -0.892023
8  1.873870
9  0.139986

Brent Snyder · Accepted Answer · 2015-04-03 15:10:41Z

0

Check out pandasql: https://pypi.python.org/pypi/pandasql

This blog post is a great tutorial for using SQL for Pandas DataFrames: http://blog.yhathq.com/posts/pandasql-sql-for-pandas-dataframes.html

This should get you started:

from pandasql import *
import pandas

def pysqldf(q):
    return sqldf(q, globals())

q = """ 
    SELECT
        *
    FROM 
        df

    WHERE
        value > 0
    ORDER BY 1; 
"""

df = pysqldf(q)

answered Apr 3, 2015 at 15:10

Brent Snyder

11 bronze badge

1 Comment

user2782562 Over a year ago

Brent- Will definitely into the link you shared..thanks

Collectives™ on Stack Overflow

Selecting columns from a pandas dataframe based on row conditions

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related