4

I have two dataframes with different sizes where one is bigger than the other but the second data frame has more columns.

I'm having problems with trying to add a data frame if it has the same column & row value as the other data frame which in this case is id

this is some dummy data and how I was trying to solve it

import pandas as pd

df1 = pd.DataFrame([(1,2,3),(3,4,5),(5,6,7),(7,8,9),(100,10,12),(100,10,12),(100,10,12)], columns=['id','value','c'])
df2 = pd.DataFrame([(1,200,3,4,6),(3,400,3,4,6),(5,600,3,4,6),(5,620,3,4,6)], columns=['id','value','x','y','z'])

so if id of the df1 and df2 are the same then add the column value by the value in "whatToAdd"

data

df1: 
         id    value    c     
         1     2        3     
         3     4        5     
         5     6        7
         7     8        9     
         100   10       12    
         100   10       12
         100   10       12
df2: 
         id    value    x     y    z
         1     200      3     4    6
         3     400      3     4    6
         5     600      3     4    6
         5     620      3     4    6 

expected:

  Out: 
     id    value    x     y    z
     1     202      3     4    6
     3     404      3     4    6
     5     606      3     4    6
     5     626      3     4    6 

tried:

for each in df1.a:
    if(df2.loc[df2['a'] == each]):
        df2['a']+=df['a']

spew out an error "The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()." which confusing for me cause i tried:

df2.loc[df2['a']==1

out of the loop and it works

2
  • What is about column c in your example? Commented May 29, 2018 at 3:45
  • @Sören not using it at all just there for dummy that so that it similar with my real data environment :) Commented May 29, 2018 at 3:53

4 Answers 4

2

Once you set both data frames to have same index:

df1 = df1.set_index("id")
df2 = df2.set_index("id")

You can do one very simple operation:

mask = df1.index.isin(df2.index)
df2["value"] += df1.loc[mask, "value"]

Output:

    value   x   y   z
id              
1   202     3   4   6
3   404     3   4   6
5   606     3   4   6
5   626     3   4   6

You can always do df2.reset_index() to get back to original setting.

Sign up to request clarification or add additional context in comments.

Comments

1

You can using set_index with add, then follow with reindex

df1.set_index('id').add(df2.set_index('id'),fill_value=0).dropna(axis=0).reset_index().reindex(columns=df2.columns)
Out[193]: 
   id  value    x    y    z
0   1  202.0  3.0  4.0  6.0
1   3  404.0  3.0  4.0  6.0
2   5  606.0  3.0  4.0  6.0
3   5  626.0  3.0  4.0  6.0

Comments

1

Here is code I came up with. It uses a dict to look up the value for each id in df1. Map can then be used to look up the value for each id in df2, creating a series that is then added to df2['value'] to produce the desired result.

df1_lookup = dict(df1.set_index('id')['value'].items())
df2['value'] += df2['id'].map(lambda x: df1_lookup.get(x, 0))

Comments

1

Here is a one-liner.

df2.loc[:, 'value'] += [df1.set_index('id').loc[i, 'value'] for i in df2.id]
print(df2)
>>>
   id  value  x  y  z
0   1    202  3  4  6
1   3    404  3  4  6
2   5    606  3  4  6
3   5    626  3  4  6

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.