fill a new column in a pandas dataframe from the value of another dataframe [duplicate]

Question

I have two dataframes :

pd.DataFrame(data={'col1': ['a', 'b', 'a', 'a', 'b'], 'col2': ['c', 'c', 'd', 'd', 'c'], 'col3': [1, 2, 3, 4, 5, 1]})
   col1 col2  col3  
0    a    c     1   
1    b    c     2   
2    a    d     3   
3    a    d     4   
4    b    c     5   
5    h    i     1
pd.DataFrame(data={'col1': ['a', 'b', 'a', 'f'], 'col2': ['c', 'c', 'd', 'k'], 'col3': [12, 23, 45, 78]})
    col1 col2  col3 
0    a    c     12  
1    b    c     23  
2    a    d     45
3    f    k     78

and I'd like to build a new column in the first one according to the values of col1 and col2 that can be found in the second one. That is this new one :

pd.DataFrame(data={'col1': ['a', 'b', 'a', 'a', 'b'], 'col2': ['c', 'c', 'd', 'd', 'c'], 'col3': [1, 2, 3, 4, 5],'col4' : [12, 23, 45, 45, 23]})
    col1 col2  col3  col4
0    a    c     1    12
1    b    c     2    23
2    a    d     3    45
3    a    d     4    45
4    b    c     5    23
5    h    i     1    NaN

How am I able to do that ?

Tks for your attention :)

Edit : it has been adviced to look for the answer in this subject Adding A Specific Column from a Pandas Dataframe to Another Pandas Dataframe but it is not the same question.

In here, not only the ID does not exist since it is splitted in col1 and col2 but above all, although being unique in the second dataframe, it is not unique in the first one. This is why I think that neither a merge nor a join can be the answer to this.

Edit2 : In addition, couples col1 and col2 of df1 may not be present in df2, in this case NaN is awaited in col4, and couples col1 and col2 of df2 may not be needed in df1. To illustrate these cases, I addes some rows in both df1 and df2 to show how it could be in the worst case scenario

it would not work as col4 is not at all the col3 of df2. col4 has to be built from col3 of df2 according to the values of col1 and col2 — Lyxthe Lyxos
– Lyxthe Lyxos, Commented Sep 7, 2018 at 22:14

Zero · Accepted Answer · 2018-09-07 15:22:01Z

0

You could also use map like

In [130]: cols = ['col1', 'col2']

In [131]: df1['col4'] = df1.set_index(cols).index.map(df2.set_index(cols)['col3'])

In [132]: df1
Out[132]:
  col1 col2  col3  col4
0    a    c     1    12
1    b    c     2    23
2    a    d     3    45
3    a    d     4    45
4    b    c     5    23

answered Sep 7, 2018 at 15:22

Zero

77.4k22 gold badges153 silver badges153 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Pyd Over a year ago

TypeError: 'Series' object is not callable

Lyxthe Lyxos Over a year ago

I get the same error as @pyd

Pyd Over a year ago

try use cols from @zero's answer then pd.merge(df1.set_index(cols),df2.set_index(cols), left_index=True, right_index=True).reset_index() then change the column names

Collectives™ on Stack Overflow

fill a new column in a pandas dataframe from the value of another dataframe [duplicate]

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related