I start with a pandas dataframe pm. It consists of several columns and rows, where one row, lets call it 'active' contains either the string 'True' or the string 'False'. For instance, it could look like this:
import pandas as pd
pm = pd.DataFrame(data={'peter': [17, 'True'],
'susan': [14, 'False'],
'tom': [1, 'False'],
'jenny': [12, 'True']},
index=['some_number', 'active'])
It looks like this:
Out[60]:
jenny peter susan tom
some_number 12 17 14 1
active True True False False
What I want is to only keep those columns, where the value of the row 'active' contains is set to 'True'. Also the strings should be casted into bools. For this example, I want the dataframe look like this:
desired = pd.DataFrame(data={'peter': [17, True],
'jenny': [12, True]},
index=['some_number', 'active'])
This must be very, very simple, but as i am new to pandas i am currently struggling with this. I thought of two steps:
1) Cast the whole row into bools, but when I try to do so, everything gets set to True
pm.loc['active',:] = pm.loc['active',:].astype(bool)
But it looks like this:
Out[61]:
jenny peter susan tom
some_number 12 17 14 1
active True True True True
2) In a second step, only keep those columns, where the value in the row 'active' is true. But it fails with the first step already.
A hint into the right direction would be appreciated.