0

I'm trying to implement a simple voting score in a csv file using pandas. Basically, if the `dataframe['C'] == Active and dataframe['Count'] == 0, then dataframe['Combo'] == 0. If dataframe['C'] == Active and dataframe['Count'] == 1; then dataframe['Combo'] == 1. If dataframe['C'] == Active and dataframe['Count'] == 2; then dataframe['Combo'] == 2 and so on.

This is my dataframe:

A        B          C           Count Combo
Ptn1    Lig1        Inactive    0      
Ptn1    Lig1        Inactive    1      
Ptn1    Lig1        Active      2      2
Ptn2    Lig2        Active      0      0
Ptn2    Lig2        Inactive    1       
Ptn3    Lig3        Active      0      0
Ptn3    Lig3        Inactive    1       
Ptn3    Lig3        Inactive    2       
Ptn3    Lig3        Inactive    3      
Ptn3    Lig3        Active      4      3

This is my code so far for clarity:

import pandas as pd
df = pd.read_csv('affinity.csv')
VOTE = 0
df['Combo'] = ''
df.loc[(df['Classification] == 'Active') & (df['Count'] == 0), 'Combo'] = VOTE
df.loc[(df['Classification] == 'Active') & (df['Count'] == 1), 'Combo'] = VOTE + 1
df.loc[(df['Classification] == 'Active') & (df['Count'] == 2), 'Combo'] = VOTE + 2
df.loc[(df['Classification] == 'Active') & (df['Count'] > 3), 'Combo'] = VOTE + 3

My code was able to do this correctly. However, there are two 'Active' values for the pair Ptn3-Lig3; one at dataframe['Count'] = 0 and another at dataframe['Count'] = 4. Is there a way to ignore the second value (i.e. consider only the smallest dataframe['Count'] value) and add the corresponding number to dataframe['Combo']? I know pandas.DataFrame.drop_duplicates()might be a way to select only unique values, but it would be really good avoid deleting any rows.

1 Answer 1

1

You could do a groupby + apply:

def foo(x):
    m = x['C'].eq('Active') 
    if m.any():
       return pd.Series(np.where(m,  x.loc[m, 'Count'].head(1), np.nan))
    else:
       return pd.Series([np.nan] * len(x))

df['Combo'] = df.groupby(['A', 'B'], group_keys=False).apply(foo).values   
print(df) 

      A     B         C  Count Combo
0  Ptn1  Lig1  Inactive      0      
1  Ptn1  Lig1  Inactive      1      
2  Ptn1  Lig1    Active      2     2
3  Ptn2  Lig2    Active      0     0
4  Ptn2  Lig2  Inactive      1      
5  Ptn3  Lig3    Active      0     0
6  Ptn3  Lig3  Inactive      1      
7  Ptn3  Lig3  Inactive      2      
8  Ptn3  Lig3  Inactive      3      
9  Ptn3  Lig3    Active      4     0

Another alternative with groupby + merge:

df = df.groupby(['A', 'B', 'C'])['C', 'Count']\
       .apply(lambda x: x['Count'].values[0] if x['C'].eq('Active').any() else np.nan)\
       .reset_index(name='Combo').fillna('').merge(df)
print(df) 

      A     B         C Combo  Count
0  Ptn1  Lig1    Active     2      2
1  Ptn1  Lig1  Inactive            0
2  Ptn1  Lig1  Inactive            1
3  Ptn2  Lig2    Active     0      0
4  Ptn2  Lig2  Inactive            1
5  Ptn3  Lig3    Active     0      0
6  Ptn3  Lig3    Active     0      4
7  Ptn3  Lig3  Inactive            1
8  Ptn3  Lig3  Inactive            2
9  Ptn3  Lig3  Inactive            3

Note that this ends up sorting your groups.

Sign up to request clarification or add additional context in comments.

6 Comments

Thank you. That worked for this sample dataframe, but when I tried to apply it to the real thing it raised an error: return pd.Series(np.where(m, x.loc[m, 'Count'].head(1), '')) ValueError: operands could not be broadcast together with shapes (5,) (0,) (). Could you explain what the function is doing? I'm really new to python and pandas.
@MarcosSantana See edit? I think I might've understood the problem.
Oh. Just saw it. Now the function is running. But I still get two values for Ptn3-Lig3 pairs. If not by that function, is there a way to change that second value to NaN or something else? Thank you again for that function!
@MarcosSantana Made a small change, see if this works?
@MarcosSantana Added a new method.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.