1

I have multiple data frames with different columns names that I would like to merge, but they have the same column name for one specific column, which is called user

df1

     user       books     groceries
0    alex        2           10
1    andrew      10          15
2    kelvin      15          20
3    mary        5           25

df2

     user        saving    animal
0    mary           2      1
1    alex           3      0
2    andrew         4      0
3    kelvin         5      2

df3

     user        place    home
0    alex           3      1
1    mary           3      5
2    andrew         4      5
3    kelvin         5      2

If there are only 2 data frames, I get my desired output with

pd.merge(df1[['user', 'books']], df2[['user', 'animal']], on='user')

Desired Output


      user       books     animal
0    alex        2           0
1    andrew      10          0
2    kelvin      15          2
3    mary        5           1

If there are 3 or more data frames, I get errors when trying to use

pd.merge(df1[['user', 'books']], df2[['user', 'animal']], df3[['user', 'place']], on='user')

Error Message

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Any suggestion on how I can achieve the desired output with 3 or more dataframes?

2
  • 2
    Please provide the error message Commented Jul 9, 2022 at 5:05
  • Error message just added. Thanks for comment @CrazeXD Commented Jul 9, 2022 at 5:09

3 Answers 3

4

You can use functools:

import functools
dfs = [df1[['user', 'books']], df2[['user', 'animal']], df3[['user', 'place']]]

df_final = functools.reduce(lambda left, right: pd.merge(left,right,on='user'), dfs)
print(df_final)

Print out:

     user  books  animal  place
0    alex      2       0      3
1  andrew     10       0      4
2  kelvin     15       2      5
3    mary      5       1      3

This has the advantage that you could easily expand on many more data frames if you wanted too.

Sign up to request clarification or add additional context in comments.

Comments

4

you can use

df4 = pd.merge(pd.merge(df1[['user', 'books']], df2[['user', 'animal']],on='user'),df3[['user', 'place']],on='user')

Comments

2

Try merging 2 at first then merge the result with the third one. As per my understanding from documentation they are expecting two dataframes to be merged at a time.

pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

Have you seen https://pandas.pydata.org/docs/reference/api/pandas.merge.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.