How to merge to two pandas data frames?

Question

I have two pandas data frames (see below).I want to merge them based on the id (Dataframe1) and localid(Dataframe2). This code is not working; it creates additional rows in dfmerged as Dataframe2 may contains multiple same localid(e.g., D3). How can I merge these two dataframes and set the value of the 'color' column as NaN if the localid does not exists in the first dataframe (DataFrame1)?

dfmerged = pd.merge(df1, df2, left_on='id', right_on='localid')

You'll first need to de-duplicate the ids in df2 by combining the colours into a single list, secondly you need to pass how='outer' if you want all ids in the final merged df by default it's inner so only ids that are present in both will be merged — EdChum
– EdChum, Commented Oct 6, 2016 at 9:15

jezrael · Accepted Answer · 2016-10-06 09:25:52Z

2

I think you need groupby and sum values in list in df2 and then use merge with drop column localid:

df1 = pd.DataFrame({'id':['D1','D2','D3','D4','D5','D6'],
                   'Field1':[12,15,11,7,55,8.8]})

print (df1)
   Field1  id
0    12.0  D1
1    15.0  D2
2    11.0  D3
3     7.0  D4
4    55.0  D5
5     8.8  D6

df2 = pd.DataFrame({'localid':['D1','D2','D3','D3','D9'],
                   'color':[['b'],['a'],['a','b'],['s','d'], ['a']]})

print (df2)
    color localid
0     [b]      D1
1     [a]      D2
2  [a, b]      D3
3  [s, d]      D3
4     [a]      D9

df2 = df2.groupby('localid', as_index=False)['color'].sum()
print (df2)
  localid         color
0      D1           [b]
1      D2           [a]
2      D3  [a, b, s, d]
3      D9           [a]


dfmerged = pd.merge(df1, 
                    df2, 
                    left_on='id', 
                    right_on='localid', 
                    how='left')
             .drop('localid', axis=1)

print (dfmerged)
   Field1  id         color
0    12.0  D1           [b]
1    15.0  D2           [a]
2    11.0  D3  [a, b, s, d]
3     7.0  D4           NaN
4    55.0  D5           NaN
5     8.8  D6           NaN

edited Oct 6, 2016 at 9:25

answered Oct 6, 2016 at 9:19

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

kitchenprinzessin Over a year ago

the D3 color values should be [a,b,s,d].

kitchenprinzessin Over a year ago

I am just about to add the groupby statement, you are quick! thanks :)

jezrael Over a year ago

Thank you for accepting! And small advice - check How to make good reproducible pandas examples and dont use pictures, because then is impossible copy data. ;)

kitchenprinzessin Over a year ago

just started with python programming, thanks for the examples ^O^

berna1111 · Accepted Answer · 2016-10-06 09:36:04Z

You should probably simplify df2 to have no repeating keys, and then tell pd.merge to use union of keys from both frames (with how:'outer'):

import pandas as pd
df1 = pd.DataFrame({    'id':['D1','D2','D3','D4','D5','D6'],
                    'Field1':[  12,  15,  11,   7,  55, 8.8]})
df2 = pd.DataFrame({'localid':['D1','D2','D3','D3','D9'],
                      'color':[['blue','grey'],
                               ['yellow'],
                               ['black','red','green'],
                               ['white'],
                               ['blue']]})
dfmerged = pd.merge(df1, df2, left_on='id', right_on='localid')
dfmerged2 = pd.merge(df1, df2, left_on='id', right_on='localid', how='outer')

Which results in:

>>> dfmerged2
   Field1   id                color localid
0    12.0   D1         [blue, grey]      D1
1    15.0   D2             [yellow]      D2
2    11.0   D3  [black, red, green]      D3
3    11.0   D3              [white]      D3
4     7.0   D4                  NaN     NaN
5    55.0   D5                  NaN     NaN
6     8.8   D6                  NaN     NaN
7     NaN  NaN               [blue]      D9

Collectives™ on Stack Overflow

How to merge to two pandas data frames?

2 Answers 2

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related