2

In df below there are three groups in the variable 'group' - 'A', 'AB', 'C'. The other columns in the df is assigned to a specific group by suffix - var1_A relates to group A and so forth.

data = pd.DataFrame({'group':['A', 'AB', 'A', 'AB', 'AB', 'C', 'C', 'A', 'A', 'AB'],
                     'var1_A':['pass', 'fail', 'pass','fail', 'pass']*2,
                     'var2_A':['pass', 'pass', 'pass','fail', 'pass']*2,
                     'var1_AB':['pass', 'pass', 'pass','fail', 'pass']*2,
                     'var2_AB':['pass', 'pass', 'fail','fail', 'pass']*2,
                     'var1_C':['pass', 'pass', 'pass','fail', 'pass']*2,
                     'var2_C': ['fail', 'fail', 'fail','fail', 'pass']*2
                    })

I want for each row count the number of times 'pass' occur. For the instances that belongs to group A I only want to count the variables that are connected to the group A. I want the result in a new column. This would almost do the job.

data['new_col'] = data[data['group']=='A']['var1_A, var2_A].isin(['pass']).sum(1)
data['new_col'] = data[data['group']=='AB']['var1_AB, var2_AB].isin(['pass']).sum(1)
data['new_col'] = data[data['group']=='C']['var1_C, var2_C].isin(['pass']).sum(1)

However, I want the result in the same column from all groups. This operation is perhaps possible to do using a groupby and transform? However, I got stuck figuring it out.

Target dataframe:

pd.DataFrame({'group':['A', 'AB', 'A', 'AB', 'AB', 'C', 'C', 'A', 'A', 'AB'],
                     'var1_A':['pass', 'fail', 'pass','fail', 'pass']*2,
                     'var2_A':['pass', 'pass', 'pass','fail', 'pass']*2,
                     'var1_AB':['pass', 'pass', 'pass','fail', 'pass']*2,
                     'var2_AB':['pass', 'pass', 'fail','fail', 'pass']*2,
                     'var1_C':['pass', 'pass', 'pass','fail', 'pass']*2,
                     'var2_C': ['fail', 'fail', 'fail','fail', 'pass']*2,
                     'result':[2,2,2,0,2,1,1,2,0,2]
                    })

2 Answers 2

2

You can melt, filter and groupby.count:

data['result'] = (data
  .rename(columns=lambda x: x.split('_')[-1]) # get only part after "_"
  .reset_index().melt(['index', 'group'])
  # keep only identical groups and "pass" values
  .loc[lambda d: d['group'].eq(d['variable']) & d['value'].eq('pass')]
  .groupby('index')['value'].count()
  .reindex(data.index, fill_value=0)
)

print(data)

Or another approach using matrices and string comparisons:

df2 = data.set_index('group').eq('pass')
data['result'] = (df2.mul(df2.columns.str.extract('_(\w+)', expand=False))
                     .eq(df2.index, axis=0).sum(axis=1)
                     .to_numpy()
                 )

Output:

  group var1_A var2_A var1_AB var2_AB var1_C var2_C  result
0     A   pass   pass    pass    pass   pass   fail       2
1    AB   fail   pass    pass    pass   pass   fail       2
2     A   pass   pass    pass    fail   pass   fail       2
3    AB   fail   fail    fail    fail   fail   fail       0
4    AB   pass   pass    pass    pass   pass   pass       2
5     C   pass   pass    pass    pass   pass   fail       1
6     C   fail   pass    pass    pass   pass   fail       1
7     A   pass   pass    pass    fail   pass   fail       2
8     A   fail   fail    fail    fail   fail   fail       0
9    AB   pass   pass    pass    pass   pass   pass       2
Sign up to request clarification or add additional context in comments.

Comments

1
dd1=data.apply(lambda ss:data.filter(regex=".+_{}$".format(ss.group)).loc[ss.name].loc[lambda ss:ss.eq("pass")].count(),axis=1)
data["result"]=dd1
data

or pd.wide_to_long

dd1=pd.wide_to_long(data.assign(col1=data.index), stubnames=['var1','var2'],
                i=['col1'], j='col2',sep='_',suffix=r'\w+').reset_index()\
    .loc[lambda dd:dd.col2.eq(dd.group)].set_index("col1")
data.assign(result=dd1.var1.map({"pass":1}).add(dd1.var2.map({"pass":1}),fill_value=0).fillna(0))

out

  group var1_A var2_A var1_AB var2_AB var1_C var2_C  result
0     A   pass   pass    pass    pass   pass   fail       2
1    AB   fail   pass    pass    pass   pass   fail       2
2     A   pass   pass    pass    fail   pass   fail       2
3    AB   fail   fail    fail    fail   fail   fail       0
4    AB   pass   pass    pass    pass   pass   pass       2
5     C   pass   pass    pass    pass   pass   fail       1
6     C   fail   pass    pass    pass   pass   fail       1
7     A   pass   pass    pass    fail   pass   fail       2
8     A   fail   fail    fail    fail   fail   fail       0
9    AB   pass   pass    pass    pass   pass   pass       2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.