Hi - I want to merge two python DataFrames, but don't want to bring over ALL of the columns from both dataframes to my new dataframe. In the picture below, if I join df1 and df2 on 'acct' and want to bring back all the columns from df1 and ONLY 'entity' from df2, how would I write that? I don't want to have to drop any columns so doing a normal merge isn't what I'm looking for. Can anyone help? Thanks!
1 Answer
When you perform the merge operation, you can modify a dataframe object that is in your function, which will mean the underlying objects df1 and df2 remain unchanged. An example would look like this:
df_result = df1.merge(df2[ ['acct','entity'] ], on ='acct')
This will let you do your partial merge without modifying either original dataframe.
2 Comments
Jim
Thank you for the response! One additional question, what if the joining columns are differently named in the above example. Say, if df1 had a column 'acct' and df2 had 'account', but I still only wanted to bring back certain columns like in my original post. Thanks again!
Mayank Porwal
If column names that you want to join on are different, you can do this:
df_result = df1.merge(df2[['account','entity']], left_on ='acct', right_on='account').