Suppose I have a dataframe like below:
user email day_diff
tom [email protected] -10
tom [email protected] -2
tom [email protected] 3
bob [email protected] -11
bob [email protected] 1
bob [email protected] 2
alice [email protected] 4
Mary [email protected] -5
What I am looking to do is for each user take every email where day_diff is positive and the first record where day_diff is negative but closest to 0. Then compare those values and if any of them are different, in a new column the value would 'yes' and if they are all the same the value would be 'no'
So for tom I would take the email where day_diff is 3, [email protected], since it's the only positive day_diff and compare it to [email protected]. Since it is different the new column for every row for tom would be 'yes'
For bob I would take the emails where day_diff is 1 and 2 and compare it to -11. Since the email at 2 and -11 are different, the new column value would be 'yes'.
If a user only has one row and the day_diff is positive, the new column value is 'yes' If the user only has emails where day_diff is negative, the new column value is 'no'
Any help would be appreciated. I've been spinning in circles trying to figure this out.
The output would look like
user email day_diff email_change
tom [email protected] -10 yes
tom [email protected] -2 yes
tom [email protected] 3 yes
bob [email protected] -11 yes
bob [email protected] 1 yes
bob [email protected] 2 yes
alice [email protected] 4 yes
Mary [email protected] -5 no