0

I have an Excel file in which I need to follow certain conditions and input in remarks column if it satisfy the condition. I get the necessary columns as DataFrames and here is how it looks:

svc_no   i_status   caller_id   f_status   result      remarks
11111    WO         11111       WO         Not Match   Duplicate svc_no 
22222    WO         22222       WO         Match
11111    WO         n/a         SP         Not Match   Duplicate svc_no

The conditions would be:

  • The svc_no is duplicated
  • One of the duplicate is equal value with caller_id
  • The other has a value of 'n/a' or 'NULL' in caller_id
  • Result is Not Match

I used .loc and write it this way

df.loc[(df['svc_no'] != 'NULL') & (df['svc_no'] == df['caller_id']) & (df['svc_no'].duplicated()) & (df['i_status'] == 'WO') & (df['f_status'] == 'WO') & (df['result'] == 'Not Match), [remarks]] = 'Duplicate svc_no'

This code maybe right for the row where the first duplicate appeared, it does not apply to the other row where the other duplicate appeared.

Question: Is there a way where I can compare two rows with duplicates and apply necessary conditions using .loc or is there a way around?

2 Answers 2

1

It's not clear what you want as your desired output. But you can find all svc_no covered by your criteria using a sequence of Boolean masks:

df = pd.DataFrame({'svc_no': [11111, 22222, 11111],
                   'caller_id': [11111, 22222, 'n/a'],
                   'result': ['Not Match', 'Match', 'Not Match']})

counts = df['svc_no'].value_counts()
grouper = df.groupby('svc_no')['caller_id']

cond1 = df['svc_no'].isin(counts[counts > 1].index)
cond2 = df['svc_no'].isin(df.loc[df['svc_no'] == df['caller_id'], 'svc_no'])
cond3 = df['svc_no'].isin(df.loc[grouper.apply(lambda x: x.isin(['n/a', 'NULL'])), 'svc_no'])
cond4 = df['svc_no'].isin(df.loc[df['result'] == 'Not Match', 'svc_no'])

df.loc[cond1 & cond2 & cond3 & cond4, 'remarks'] = 'Duplicate svc_no'

print(df)

  caller_id     result  svc_no           remarks
0     11111  Not Match   11111  Duplicate svc_no
1     22222      Match   22222               NaN
2       n/a  Not Match   11111  Duplicate svc_no
Sign up to request clarification or add additional context in comments.

8 Comments

My desired output would be, after I satisfy the conditions, I will input string Duplicate svc_no in the remarks column. Though my problem is that my code applies for the first found duplicate value only.
@RickyAguilar, OK, I've updated, not sure it's exactly what you want, but it produces your desired output.
There was a SytaxError on df.loc[cond1 & cond2 & cond3 & cond4, 'remarks'] = 'Duplicate svc_no'~
@RickyAguilar, Did you copy/paste my code exactly? Works fine for me. Try restarting your session.
@RickyAguilar, I'll try and have a look a little later!
|
0

You have to tell duplicated that you want to mark all duplicates - by default it only marks everything but the first occurrence of a value:

df['svc_no'].duplicated(keep=False)

see also the docs

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.