2

I've been stuck on a Pandas problem and I can't seem to figure it out. I have a dataframe like this:

ref, value, rule, result, new_column
a100, 25, high, fail, nan
a100, 25, high, pass, nan
a100, 25, medium, fail, nan
a100, 25, medium, pass, nan
a101, 15, high, fail, nan
a101, 15, high, pass, nan
a102, 20, high, pass, nan

I want to add a new column to this dataframe with the following pseudocode

For each unique value in ref, if result = fail, then new_column = no for all subsequent rows of the same "ref" value.

This is how the new dataframe should look like.

ref, value, rule, result, new_column
a100, 25, high, fail, no
a100, 25, high, pass, no
a100, 25, medium, fail, no
a100, 25, medium, pass, no
a101, 15, high, fail, no
a101, 15, high, pass, no
a102, 20, high, pass, yes

What I've managed to do is the following:

ref, value, rule, result, new_column
a100, 25, high, fail, no
a100, 25, high, pass, yes

This is achieved through the df.loc function. But I need the function to apply to unique values, rather than each row.

8
  • Do you think new_column = no or yes ? Commented Aug 23, 2016 at 13:41
  • 1
    Can you add more rows to dataframe, becasue I think it is a bit unclear for me. Commented Aug 23, 2016 at 13:43
  • @jezrael Updated, please check. new_column = no for all instances where result = fail, for each unique ref value. Commented Aug 23, 2016 at 13:52
  • @Kvothe if your first two rows were swapped, would the new_column be yes, no, no, no or still be all no's? Commented Aug 23, 2016 at 13:53
  • @JonClements, I've updated the question. But if the rows were swapped. it would still be no, no, no for that value of "ref". Commented Aug 23, 2016 at 13:55

1 Answer 1

4

I think you can use transform:

print (df)
    ref  value    rule result  new_column
0  a100     25    high   pass         NaN
1  a100     25    high   fail         NaN
2  a100     25  medium   fail         NaN
3  a100     25  medium   pass         NaN
4  a101     15    high   fail         NaN
5  a101     15    high   pass         NaN
6  a102     20    high   pass         NaN

df['new_column']=df.groupby('ref')['result']
                   .transform(lambda x: 'no' if ((x=='fail').any()) else 'yes')
print (df)
    ref  value    rule result new_column
0  a100     25    high   pass         no
1  a100     25    high   fail         no
2  a100     25  medium   fail         no
3  a100     25  medium   pass         no
4  a101     15    high   fail         no
5  a101     15    high   pass         no
6  a102     20    high   pass        yes

Thank you Jon Clements for another solution with replace:

df['new_column'] = df.groupby('ref')['result']
                     .transform(lambda L: (L == 'fail').any())
                     .replace({True: 'no', False: 'yes'})

print (df)
    ref  value    rule result new_column
0  a100     25    high   pass         no
1  a100     25    high   fail         no
2  a100     25  medium   fail         no
3  a100     25  medium   pass         no
4  a101     15    high   fail         no
5  a101     15    high   pass         no
6  a102     20    high   pass        yes
Sign up to request clarification or add additional context in comments.

1 Comment

Umm... I had df.groupby('ref')['result'].transform(lambda L: (L == 'fail').any()).replace({True: 'no', False: 'yes'})

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.