0

I have a dataframe where I want to determine when the ser_no and CTRY_NM are the same and differ. However, I want to be mindful of the ser_no changes and not make a false and false return true or a false/true return false.

Consider the following dataframe:

import pandas as pd
df = pd.DataFrame({'ser_no': [1, 1, 1, 2, 2, 2, 2, 3, 3, 3],
                'CTRY_NM': ['a', 'a', 'b', 'e', 'e', 'a', 'b', 'b', 'b', 'd']})
def check(key):
    return df[key] == df[key].shift(1)

match = check('ser_no') == check('CTRY_NM')

This returns:

enter image description here

However, at indices, 4 and 8 we have serial number changes. Since each serial number is a different machine, it doesn't make sense to have a logical comparison at these locations. When ser_no changes, how can I insert NaN instead of do a logical comparison?

2
  • You probably want to use groupby() first. Commented Mar 29, 2016 at 13:28
  • @CorleyBrigman can you elaborate on how groupby will help? Commented Mar 29, 2016 at 13:32

1 Answer 1

2

is this what you want?

def check(data, key):
    mask = data[key].shift(1) == data[key]
    mask.iloc[0] = np.nan
    return mask

df.groupby(by=['ser_no']).apply(lambda x: check(x, 'CTRY_NM'))

result

ser_no   
1       0   NaN
        1     1
        2     0
2       3   NaN
        4     1
        5     0
        6     0
3       7   NaN
        8     1
        9     0
Name: CTRY_NM, dtype: float64
Sign up to request clarification or add additional context in comments.

1 Comment

Yes that is what I was trying to achieve. Can you add some text with what is occurring so I have a better understanding?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.