1

I start with a pandas dataframe pm. It consists of several columns and rows, where one row, lets call it 'active' contains either the string 'True' or the string 'False'. For instance, it could look like this:

import pandas as pd
pm = pd.DataFrame(data={'peter': [17, 'True'],
                        'susan': [14, 'False'],
                        'tom': [1, 'False'],
                        'jenny': [12, 'True']},
                  index=['some_number', 'active'])

It looks like this:

Out[60]: 
            jenny peter  susan    tom
some_number    12    17     14      1
active       True  True  False  False

What I want is to only keep those columns, where the value of the row 'active' contains is set to 'True'. Also the strings should be casted into bools. For this example, I want the dataframe look like this:

desired = pd.DataFrame(data={'peter': [17, True],
                             'jenny': [12, True]},
                       index=['some_number', 'active'])

This must be very, very simple, but as i am new to pandas i am currently struggling with this. I thought of two steps:

1) Cast the whole row into bools, but when I try to do so, everything gets set to True

pm.loc['active',:] = pm.loc['active',:].astype(bool)

But it looks like this:

Out[61]: 
            jenny peter susan   tom
some_number    12    17    14     1
active       True  True  True  True

2) In a second step, only keep those columns, where the value in the row 'active' is true. But it fails with the first step already.

A hint into the right direction would be appreciated.

1 Answer 1

2

I'd first replace the string values with their boolean equivalents calling replace, you can then use label indexing to select that row, generate a boolean series where the value equals True and use this to select the columns:

In [226]:

pm.replace('True',True, inplace=True)
pm.replace('False',False,inplace=True)
In [228]:

pm[pm.columns[pm.loc['active'] == True]]

Out[228]:
            jenny peter
some_number    12    17
active       True  True

Breaking the above down:

In [229]:

pm.loc['active'] == True
Out[229]:
jenny     True
peter     True
susan    False
tom      False
Name: active, dtype: bool
In [230]:

pm.columns[pm.loc['active'] == True]
Out[230]:
Index(['jenny', 'peter'], dtype='object')

EDIT

As @DSM has pointed out you could use the fact that because the values are now real bools then you can use this to select the columns:

In [234]:

pm.loc[:,pm.loc["active"]]
Out[234]:
            jenny peter
some_number    12    17
active       True  True

ANOTHER UPDATE

If you're worried about calling replace on the whole df then you can call replace just on that row:

pm.loc['active'].replace('True',True, inplace=True)
pm.loc['active'].replace('False',False,inplace=True)
Sign up to request clarification or add additional context in comments.

4 Comments

Not sure I'd index into pm.columns; wouldn't pm.loc[:,pm.loc["active"]] give the same results (after we make active into bools)?
@DSM yes I guess that would be cleaner I was approaching this in a more literal sense with respect to what the OP wanted but yes once the values are bools then that would work, I'll update my answer
This does achieve my goal. Didn't think of replace. In my case it seems safe to replace on the whole dataframe, is there a way to only replace the strings of one specific row, i.e. the row called 'active'?
@Nras you can call replace just on that row by using loc to perform row label selection

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.