How do I add multiple strings to a column in a pandas dataframe based on conditions in each row?

Question

I have a dataframe containing customers charges and the charges on their contract. I want to compare the respective charges for each customer and flag what doesn't match for each one. Here is what the df looks like:

Resident	Tcode	MoveIn	1xdisc	1xdisc_doc	rent	rent_doc
Marcus	t0011009	3/16/2021	0.0	-500.0	0	1632
Joshua	t0011124	3/20/2021	0.0	0.0	1642	1642
Yvonne	t0010940	3/17/2021	-500.0	-500.0	1655	1655
Mirabeau	t0011005	3/19/2021	-500.0	-500.0	1931	1990
Keyonna	t0011084	3/18/2021	0.0	0.0	1600	1600
Ariel	t0010954	3/22/2021	-300.0	0.0	1300	1320

I want to add a column containing all the problems for each row as a string. here is the output I would like, with column 'Problem' containing all the problems for each row:

Resident	Tcode	MoveIn	1xdisc	1xdisc_doc	rent	rent_doc	Problem
Marcus	t0011009	3/16/2021	0.0	-500.0	0	1632	rent doesn't match. 1xdisc doesn't match
Joshua	t0011124	3/20/2021	0.0	0.0	1642	1642
Yvonne	t0010940	3/17/2021	-500.0	-500.0	1655	1655
Mirabeau	t0011005	3/19/2021	-500.0	-500.0	1931	1990	rent doesn't match.
Keyonna	t0011084	3/18/2021	0.0	0.0	1600	1600
Ariel	t0010954	3/22/2021	-300.0	0.0	1300	1320	rent doesn't match. 1xdisc doesn't match

So far I am trying

nonmatch["Problem"] = np.where(nonmatch['rent'] != nonmatch['rent_doc'],  "rent doesn't match", nonmatch["Problem"] + "")
nonmatch["Problem"] = np.where(nonmatch['1xdisc']!=nonmatch['1xdisc_doc']), " 1xdisc doesn't match.", "")
print(nonmatch[['Resident','Problem']])

but then any errors that were already in the cell get overwritten. How do I add a string to the contents of the cell if the condition is met?

I also have a hunch that there must be a cleaner way to do this apply lambda, but I'm not sure how. I have about ten conditions I want to check for, but this is a minimal example.

anky · Accepted Answer · 2021-03-24 18:59:31Z

You can also try with concat and groupby+agg. This may be over engineered as piR says:

c1 = df['rent'].ne(df['rent_doc'])
c2 = df['1xdisc'].ne(df['1xdisc_doc'])
choices= ["rent doesn't match"," 1xdisc doesn't match."]

s = pd.concat((c1,c2),keys=choices).swaplevel()
out = (df.assign(Problem=
      pd.DataFrame.from_records(s[s].index).groupby(0)[1].agg(" ".join)))

print(out)

   Resident     Tcode     MoveIn  1xdisc  1xdisc_doc  conpark  rent  rent_doc  \
0    Marcus  t0011009  3/16/2021     0.0      -500.0      0.0     0      1632   
1    Joshua  t0011124  3/20/2021     0.0         0.0      0.0  1642      1642   
2    Yvonne  t0010940  3/17/2021  -500.0      -500.0      0.0  1655      1655   
3  Mirabeau  t0011005  3/19/2021  -500.0      -500.0      0.0  1931      1990   
4   Keyonna  t0011084  3/18/2021     0.0         0.0      0.0  1600      1600   
5     Ariel  t0010954  3/22/2021  -300.0         0.0      0.0  1300      1320   

                                     Problem  
0  rent doesn't match  1xdisc doesn't match.  
1                                        NaN  
2                                        NaN  
3                         rent doesn't match  
4                                        NaN  
5  rent doesn't match  1xdisc doesn't match.

piRSquared · Accepted Answer · 2021-03-24 19:04:27Z

My take on this:

def get_match(c):
    def match(x):
        return f'{c} doesn\'t match.' if x else ''
    return match

onex = (df['1xdisc'] != df['1xdisc_doc']).map(get_match('1xdisc'))
rent = (df['rent']   != df['rent_doc']  ).map(get_match('rent'))

df.assign(Problem=(['  '.join(filter(bool, tup)) for tup in zip(rent, onex)]))

   Resident     Tcode     MoveIn  1xdisc  1xdisc_doc  conpark  rent  rent_doc                                     Problem
0    Marcus  t0011009  3/16/2021     0.0      -500.0      0.0     0      1632  rent doesn't match.  1xdisc doesn't match.
1    Joshua  t0011124  3/20/2021     0.0         0.0      0.0  1642      1642                                            
2    Yvonne  t0010940  3/17/2021  -500.0      -500.0      0.0  1655      1655                                            
3  Mirabeau  t0011005  3/19/2021  -500.0      -500.0      0.0  1931      1990                         rent doesn't match.
4   Keyonna  t0011084  3/18/2021     0.0         0.0      0.0  1600      1600                                            
5     Ariel  t0010954  3/22/2021  -300.0         0.0      0.0  1300      1320  rent doesn't match.  1xdisc doesn't match.

Generalized

docs = [s for s in [*df] if s.endswith('_doc')]
refs = [s.rsplit('_', 1)[0] for s in docs]

def col_match(c):
    return [f"{c.name} doesn't match" if x else "" for x in c]

problem_df = (df[refs] != df[docs].to_numpy()).apply(col_match)
problem = ['  '.join(filter(bool, tup)) for tup in zip(*map(problem_df.get, refs))]
df.assign(Problem=problem)

Collectives™ on Stack Overflow

How do I add multiple strings to a column in a pandas dataframe based on conditions in each row?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related