I've been stuck on a Pandas problem and I can't seem to figure it out. I have a dataframe like this:
ref, value, rule, result, new_column
a100, 25, high, fail, nan
a100, 25, high, pass, nan
a100, 25, medium, fail, nan
a100, 25, medium, pass, nan
a101, 15, high, fail, nan
a101, 15, high, pass, nan
a102, 20, high, pass, nan
I want to add a new column to this dataframe with the following pseudocode
For each unique value in ref, if result = fail, then new_column = no for all subsequent rows of the same "ref" value.
This is how the new dataframe should look like.
ref, value, rule, result, new_column
a100, 25, high, fail, no
a100, 25, high, pass, no
a100, 25, medium, fail, no
a100, 25, medium, pass, no
a101, 15, high, fail, no
a101, 15, high, pass, no
a102, 20, high, pass, yes
What I've managed to do is the following:
ref, value, rule, result, new_column
a100, 25, high, fail, no
a100, 25, high, pass, yes
This is achieved through the df.loc function.
But I need the function to apply to unique values, rather than each row.
new_column = nooryes?