Pandas apply function to unique values in column

Question

I've been stuck on a Pandas problem and I can't seem to figure it out. I have a dataframe like this:

ref, value, rule, result, new_column
a100, 25, high, fail, nan
a100, 25, high, pass, nan
a100, 25, medium, fail, nan
a100, 25, medium, pass, nan
a101, 15, high, fail, nan
a101, 15, high, pass, nan
a102, 20, high, pass, nan

I want to add a new column to this dataframe with the following pseudocode

For each unique value in ref, if result = fail, then new_column = no for all subsequent rows of the same "ref" value.

This is how the new dataframe should look like.

ref, value, rule, result, new_column
a100, 25, high, fail, no
a100, 25, high, pass, no
a100, 25, medium, fail, no
a100, 25, medium, pass, no
a101, 15, high, fail, no
a101, 15, high, pass, no
a102, 20, high, pass, yes

What I've managed to do is the following:

ref, value, rule, result, new_column
a100, 25, high, fail, no
a100, 25, high, pass, yes

This is achieved through the df.loc function. But I need the function to apply to unique values, rather than each row.

Can you add more rows to dataframe, becasue I think it is a bit unclear for me. — jezrael
– jezrael, Commented Aug 23, 2016 at 13:43
@jezrael Updated, please check. new_column = no for all instances where result = fail, for each unique ref value. — Kvothe
– Kvothe, Commented Aug 23, 2016 at 13:52
@Kvothe if your first two rows were swapped, would the new_column be yes, no, no, no or still be all no's? — Jon Clements
– Jon Clements, Commented Aug 23, 2016 at 13:53
@JonClements, I've updated the question. But if the rows were swapped. it would still be no, no, no for that value of "ref". — Kvothe
– Kvothe, Commented Aug 23, 2016 at 13:55

alessiosavi · Accepted Answer · 2021-11-23 16:11:58Z

4

I think you can use transform:

print (df)
    ref  value    rule result  new_column
0  a100     25    high   pass         NaN
1  a100     25    high   fail         NaN
2  a100     25  medium   fail         NaN
3  a100     25  medium   pass         NaN
4  a101     15    high   fail         NaN
5  a101     15    high   pass         NaN
6  a102     20    high   pass         NaN

df['new_column']=df.groupby('ref')['result']
                   .transform(lambda x: 'no' if ((x=='fail').any()) else 'yes')
print (df)
    ref  value    rule result new_column
0  a100     25    high   pass         no
1  a100     25    high   fail         no
2  a100     25  medium   fail         no
3  a100     25  medium   pass         no
4  a101     15    high   fail         no
5  a101     15    high   pass         no
6  a102     20    high   pass        yes

Thank you Jon Clements for another solution with replace:

df['new_column'] = df.groupby('ref')['result']
                     .transform(lambda L: (L == 'fail').any())
                     .replace({True: 'no', False: 'yes'})

print (df)
    ref  value    rule result new_column
0  a100     25    high   pass         no
1  a100     25    high   fail         no
2  a100     25  medium   fail         no
3  a100     25  medium   pass         no
4  a101     15    high   fail         no
5  a101     15    high   pass         no
6  a102     20    high   pass        yes

edited Nov 23, 2021 at 16:11

alessiosavi

3,0952 gold badges21 silver badges43 bronze badges

answered Aug 23, 2016 at 13:30

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jon Clements Over a year ago

Umm... I had df.groupby('ref')['result'].transform(lambda L: (L == 'fail').any()).replace({True: 'no', False: 'yes'})

Collectives™ on Stack Overflow

Pandas apply function to unique values in column

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related