Python. Pandas. Merge

Question

I have written a code that merges File B into File A based on a column 'Code'. Some of the values from File B, however, are generic ('Color') and I would need to do another merge with file C - but instead of creating a new column I would like to use the same column created during the first merge and whenever first merge returned value 'Color' only for those rows do merge with file C to get the proper value.

I went as far as merging A with B:

import pandas as pd

File_A = pd.read_excel(r'.../My Files/Python/Supplier cat testing/File A.xlsx')
File_B = pd.read_excel(r'.../My Files/Python/Supplier cat testing/File B.xlsx')
File_C = pd.read_excel(r'.../My Files/Python/Supplier cat testing/File C.xlsx')
results = pd.merge(File_A, File_B[['Code','Color']], on='Code')

results.to_excel('Output_File.xlsx', index=False)

Would anyone have any idea where do I even start, please?

...which is to say -- you're expected to have already started (and thus encountered a narrow, specific problem) before asking here. — Charles Duffy
– Charles Duffy, Commented Dec 10, 2020 at 22:29
@Charles Duffy Point taken. I am pretty new to this page. I have added the code. — Adrian
– Adrian, Commented Dec 11, 2020 at 6:45

frankr6591 · Accepted Answer · 2020-12-10 22:42:51Z

2

Try :

dfOut = dfB.copy()
dfOut['Color'] = dfB.merge(dfC, on='Code').apply(lambda r: r.Color_x if r.Color_y == 'Color' else r.Color_y, axis=1)
print(df)

Output

   Code   Color
0     0   Green
1     1  Yellow
2     2  Orange
3     3     Red
4     4   Black

answered Dec 10, 2020 at 22:42

frankr6591

1,2671 gold badge9 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Adrian Over a year ago

Thank you very much. It worked. So the code basically creates the output file as a copy of the dfB and then merges it with dfC on Code by replacing values in column 'Color' on the dfB file by performing merge with dfC on 'Code' columns and then with conditional statement used with a lambda whenever the value is 'Color' it uses the proper color from the dfC? Use r.Color_x if r.Color_y == 'Color' otherwise use r.Color_y. What are: r.Color_x and r.Color_y? I can't quite grasp how does that work? Thank you.

frankr6591 Over a year ago

The 'r' refers to row. After the merge, the 'Color' from dfB and dfC become 'Color_x' and 'Color_y' respectively (there is a way to force the name, e.g. Color_B and Color_C). So r.Color_x refers to the Color value from dfB. And r.Color_y refers to dfC. Now you just do the logic that if one value is 'Color' then use the other.

frankr6591 Over a year ago

Look up `merge' function - pandas.pydata.org/pandas-docs/stable/reference/api/…

frankr6591 Over a year ago

On the merge() function add suffixes = ['_b', '_c'] then you can use r.color_b and r.color_c.

frankr6591 Over a year ago

use "apply(lambda r: r['Color Name_x'] if r[Color Name_y'] == 'Color' else r['Color Name_y'], axis=1)"

|

Collectives™ on Stack Overflow

Python. Pandas. Merge

1 Answer 1

Output

10 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Output

10 Comments

Your Answer

Sign up or log in

Post as a guest

Related