0

I have two dataframes :

pd.DataFrame(data={'col1': ['a', 'b', 'a', 'a', 'b'], 'col2': ['c', 'c', 'd', 'd', 'c'], 'col3': [1, 2, 3, 4, 5, 1]})
   col1 col2  col3  
0    a    c     1   
1    b    c     2   
2    a    d     3   
3    a    d     4   
4    b    c     5   
5    h    i     1
pd.DataFrame(data={'col1': ['a', 'b', 'a', 'f'], 'col2': ['c', 'c', 'd', 'k'], 'col3': [12, 23, 45, 78]})
    col1 col2  col3 
0    a    c     12  
1    b    c     23  
2    a    d     45
3    f    k     78  

and I'd like to build a new column in the first one according to the values of col1 and col2 that can be found in the second one. That is this new one :

pd.DataFrame(data={'col1': ['a', 'b', 'a', 'a', 'b'], 'col2': ['c', 'c', 'd', 'd', 'c'], 'col3': [1, 2, 3, 4, 5],'col4' : [12, 23, 45, 45, 23]})
    col1 col2  col3  col4
0    a    c     1    12
1    b    c     2    23
2    a    d     3    45
3    a    d     4    45
4    b    c     5    23
5    h    i     1    NaN

How am I able to do that ?

Tks for your attention :)

Edit : it has been adviced to look for the answer in this subject Adding A Specific Column from a Pandas Dataframe to Another Pandas Dataframe but it is not the same question.

In here, not only the ID does not exist since it is splitted in col1 and col2 but above all, although being unique in the second dataframe, it is not unique in the first one. This is why I think that neither a merge nor a join can be the answer to this.

Edit2 : In addition, couples col1 and col2 of df1 may not be present in df2, in this case NaN is awaited in col4, and couples col1 and col2 of df2 may not be needed in df1. To illustrate these cases, I addes some rows in both df1 and df2 to show how it could be in the worst case scenario

2
  • 2
    df1.merge(df2.rename(columns={'col3': 'col4'})) Commented Sep 7, 2018 at 15:19
  • it would not work as col4 is not at all the col3 of df2. col4 has to be built from col3 of df2 according to the values of col1 and col2 Commented Sep 7, 2018 at 22:14

1 Answer 1

0

You could also use map like

In [130]: cols = ['col1', 'col2']

In [131]: df1['col4'] = df1.set_index(cols).index.map(df2.set_index(cols)['col3'])

In [132]: df1
Out[132]:
  col1 col2  col3  col4
0    a    c     1    12
1    b    c     2    23
2    a    d     3    45
3    a    d     4    45
4    b    c     5    23
Sign up to request clarification or add additional context in comments.

3 Comments

TypeError: 'Series' object is not callable
I get the same error as @pyd
try use cols from @zero's answer then pd.merge(df1.set_index(cols),df2.set_index(cols), left_index=True, right_index=True).reset_index() then change the column names

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.