0

I have written a code that merges File B into File A based on a column 'Code'. Some of the values from File B, however, are generic ('Color') and I would need to do another merge with file C - but instead of creating a new column I would like to use the same column created during the first merge and whenever first merge returned value 'Color' only for those rows do merge with file C to get the proper value.

I went as far as merging A with B:

import pandas as pd

File_A = pd.read_excel(r'.../My Files/Python/Supplier cat testing/File A.xlsx')
File_B = pd.read_excel(r'.../My Files/Python/Supplier cat testing/File B.xlsx')
File_C = pd.read_excel(r'.../My Files/Python/Supplier cat testing/File C.xlsx')
results = pd.merge(File_A, File_B[['Code','Color']], on='Code')

results.to_excel('Output_File.xlsx', index=False)

Would anyone have any idea where do I even start, please?

enter image description here

4
  • 2
    Can you post your code? Commented Dec 10, 2020 at 22:27
  • ...which is to say -- you're expected to have already started (and thus encountered a narrow, specific problem) before asking here. Commented Dec 10, 2020 at 22:29
  • @TomSelleck Code added. Commented Dec 11, 2020 at 6:44
  • @Charles Duffy Point taken. I am pretty new to this page. I have added the code. Commented Dec 11, 2020 at 6:45

1 Answer 1

2

Try :

dfOut = dfB.copy()
dfOut['Color'] = dfB.merge(dfC, on='Code').apply(lambda r: r.Color_x if r.Color_y == 'Color' else r.Color_y, axis=1)
print(df)

Output

   Code   Color
0     0   Green
1     1  Yellow
2     2  Orange
3     3     Red
4     4   Black
Sign up to request clarification or add additional context in comments.

10 Comments

Thank you very much. It worked. So the code basically creates the output file as a copy of the dfB and then merges it with dfC on Code by replacing values in column 'Color' on the dfB file by performing merge with dfC on 'Code' columns and then with conditional statement used with a lambda whenever the value is 'Color' it uses the proper color from the dfC? Use r.Color_x if r.Color_y == 'Color' otherwise use r.Color_y. What are: r.Color_x and r.Color_y? I can't quite grasp how does that work? Thank you.
The 'r' refers to row. After the merge, the 'Color' from dfB and dfC become 'Color_x' and 'Color_y' respectively (there is a way to force the name, e.g. Color_B and Color_C). So r.Color_x refers to the Color value from dfB. And r.Color_y refers to dfC. Now you just do the logic that if one value is 'Color' then use the other.
On the merge() function add suffixes = ['_b', '_c'] then you can use r.color_b and r.color_c.
use "apply(lambda r: r['Color Name_x'] if r[Color Name_y'] == 'Color' else r['Color Name_y'], axis=1)"
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.