Keep dataframe columns depending on row value

Question

I start with a pandas dataframe pm. It consists of several columns and rows, where one row, lets call it 'active' contains either the string 'True' or the string 'False'. For instance, it could look like this:

import pandas as pd
pm = pd.DataFrame(data={'peter': [17, 'True'],
                        'susan': [14, 'False'],
                        'tom': [1, 'False'],
                        'jenny': [12, 'True']},
                  index=['some_number', 'active'])

It looks like this:

Out[60]: 
            jenny peter  susan    tom
some_number    12    17     14      1
active       True  True  False  False

What I want is to only keep those columns, where the value of the row 'active' contains is set to 'True'. Also the strings should be casted into bools. For this example, I want the dataframe look like this:

desired = pd.DataFrame(data={'peter': [17, True],
                             'jenny': [12, True]},
                       index=['some_number', 'active'])

This must be very, very simple, but as i am new to pandas i am currently struggling with this. I thought of two steps:

1) Cast the whole row into bools, but when I try to do so, everything gets set to True

pm.loc['active',:] = pm.loc['active',:].astype(bool)

But it looks like this:

Out[61]: 
            jenny peter susan   tom
some_number    12    17    14     1
active       True  True  True  True

2) In a second step, only keep those columns, where the value in the row 'active' is true. But it fails with the first step already.

A hint into the right direction would be appreciated.

EdChum · Accepted Answer · 2015-03-18 15:11:45Z

2

I'd first replace the string values with their boolean equivalents calling replace, you can then use label indexing to select that row, generate a boolean series where the value equals True and use this to select the columns:

In [226]:

pm.replace('True',True, inplace=True)
pm.replace('False',False,inplace=True)
In [228]:

pm[pm.columns[pm.loc['active'] == True]]

Out[228]:
            jenny peter
some_number    12    17
active       True  True

Breaking the above down:

In [229]:

pm.loc['active'] == True
Out[229]:
jenny     True
peter     True
susan    False
tom      False
Name: active, dtype: bool
In [230]:

pm.columns[pm.loc['active'] == True]
Out[230]:
Index(['jenny', 'peter'], dtype='object')

EDIT

As @DSM has pointed out you could use the fact that because the values are now real bools then you can use this to select the columns:

In [234]:

pm.loc[:,pm.loc["active"]]
Out[234]:
            jenny peter
some_number    12    17
active       True  True

ANOTHER UPDATE

If you're worried about calling replace on the whole df then you can call replace just on that row:

pm.loc['active'].replace('True',True, inplace=True)
pm.loc['active'].replace('False',False,inplace=True)

edited Mar 18, 2015 at 15:11

answered Mar 18, 2015 at 15:05

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

DSM Over a year ago

Not sure I'd index into pm.columns; wouldn't pm.loc[:,pm.loc["active"]] give the same results (after we make active into bools)?

EdChum Over a year ago

@DSM yes I guess that would be cleaner I was approaching this in a more literal sense with respect to what the OP wanted but yes once the values are bools then that would work, I'll update my answer

Nras Over a year ago

This does achieve my goal. Didn't think of replace. In my case it seems safe to replace on the whole dataframe, is there a way to only replace the strings of one specific row, i.e. the row called 'active'?

EdChum Over a year ago

@Nras you can call replace just on that row by using loc to perform row label selection

Collectives™ on Stack Overflow

Keep dataframe columns depending on row value

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related