1

I have a dataframe containing customers charges and the charges on their contract. I want to compare the respective charges for each customer and flag what doesn't match for each one. Here is what the df looks like:

Resident Tcode MoveIn 1xdisc 1xdisc_doc conpark rent rent_doc
Marcus t0011009 3/16/2021 0.0 -500.0 0.0 0 1632
Joshua t0011124 3/20/2021 0.0 0.0 0.0 1642 1642
Yvonne t0010940 3/17/2021 -500.0 -500.0 0.0 1655 1655
Mirabeau t0011005 3/19/2021 -500.0 -500.0 0.0 1931 1990
Keyonna t0011084 3/18/2021 0.0 0.0 0.0 1600 1600
Ariel t0010954 3/22/2021 -300.0 0.0 0.0 1300 1320

I want to add a column containing all the problems for each row as a string. here is the output I would like, with column 'Problem' containing all the problems for each row:

Resident Tcode MoveIn 1xdisc 1xdisc_doc conpark rent rent_doc Problem
Marcus t0011009 3/16/2021 0.0 -500.0 0.0 0 1632 rent doesn't match. 1xdisc doesn't match
Joshua t0011124 3/20/2021 0.0 0.0 0.0 1642 1642
Yvonne t0010940 3/17/2021 -500.0 -500.0 0.0 1655 1655
Mirabeau t0011005 3/19/2021 -500.0 -500.0 0.0 1931 1990 rent doesn't match.
Keyonna t0011084 3/18/2021 0.0 0.0 0.0 1600 1600
Ariel t0010954 3/22/2021 -300.0 0.0 0.0 1300 1320 rent doesn't match. 1xdisc doesn't match

So far I am trying

nonmatch["Problem"] = np.where(nonmatch['rent'] != nonmatch['rent_doc'],  "rent doesn't match", nonmatch["Problem"] + "")
nonmatch["Problem"] = np.where(nonmatch['1xdisc']!=nonmatch['1xdisc_doc']), " 1xdisc doesn't match.", "")
print(nonmatch[['Resident','Problem']])

but then any errors that were already in the cell get overwritten. How do I add a string to the contents of the cell if the condition is met?

I also have a hunch that there must be a cleaner way to do this apply lambda, but I'm not sure how. I have about ten conditions I want to check for, but this is a minimal example.

2 Answers 2

2

You can also try with concat and groupby+agg. This may be over engineered as piR says:

c1 = df['rent'].ne(df['rent_doc'])
c2 = df['1xdisc'].ne(df['1xdisc_doc'])
choices= ["rent doesn't match"," 1xdisc doesn't match."]

s = pd.concat((c1,c2),keys=choices).swaplevel()
out = (df.assign(Problem=
      pd.DataFrame.from_records(s[s].index).groupby(0)[1].agg(" ".join)))

print(out)

   Resident     Tcode     MoveIn  1xdisc  1xdisc_doc  conpark  rent  rent_doc  \
0    Marcus  t0011009  3/16/2021     0.0      -500.0      0.0     0      1632   
1    Joshua  t0011124  3/20/2021     0.0         0.0      0.0  1642      1642   
2    Yvonne  t0010940  3/17/2021  -500.0      -500.0      0.0  1655      1655   
3  Mirabeau  t0011005  3/19/2021  -500.0      -500.0      0.0  1931      1990   
4   Keyonna  t0011084  3/18/2021     0.0         0.0      0.0  1600      1600   
5     Ariel  t0010954  3/22/2021  -300.0         0.0      0.0  1300      1320   

                                     Problem  
0  rent doesn't match  1xdisc doesn't match.  
1                                        NaN  
2                                        NaN  
3                         rent doesn't match  
4                                        NaN  
5  rent doesn't match  1xdisc doesn't match. 
Sign up to request clarification or add additional context in comments.

Comments

1

My take on this:

def get_match(c):
    def match(x):
        return f'{c} doesn\'t match.' if x else ''
    return match

onex = (df['1xdisc'] != df['1xdisc_doc']).map(get_match('1xdisc'))
rent = (df['rent']   != df['rent_doc']  ).map(get_match('rent'))

df.assign(Problem=(['  '.join(filter(bool, tup)) for tup in zip(rent, onex)]))

   Resident     Tcode     MoveIn  1xdisc  1xdisc_doc  conpark  rent  rent_doc                                     Problem
0    Marcus  t0011009  3/16/2021     0.0      -500.0      0.0     0      1632  rent doesn't match.  1xdisc doesn't match.
1    Joshua  t0011124  3/20/2021     0.0         0.0      0.0  1642      1642                                            
2    Yvonne  t0010940  3/17/2021  -500.0      -500.0      0.0  1655      1655                                            
3  Mirabeau  t0011005  3/19/2021  -500.0      -500.0      0.0  1931      1990                         rent doesn't match.
4   Keyonna  t0011084  3/18/2021     0.0         0.0      0.0  1600      1600                                            
5     Ariel  t0010954  3/22/2021  -300.0         0.0      0.0  1300      1320  rent doesn't match.  1xdisc doesn't match.

Generalized

docs = [s for s in [*df] if s.endswith('_doc')]
refs = [s.rsplit('_', 1)[0] for s in docs]

def col_match(c):
    return [f"{c.name} doesn't match" if x else "" for x in c]

problem_df = (df[refs] != df[docs].to_numpy()).apply(col_match)
problem = ['  '.join(filter(bool, tup)) for tup in zip(*map(problem_df.get, refs))]
df.assign(Problem=problem)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.