1

I'm sorry if this has been asked before but I wasn't sure how to word this question into a search.

I have 2 data frames with a year column and value column. I want to udpate the first data frame based on matching the year and update the value column according to which value is larger. Suppose the data frames look like this

>>> import pandas as pd
>>> x = [1999, 2000, 2001]
>>> y = [0, 0, 0]
>>> df1 = pd.DataFrame({'year': x, 'value': y})
>>> df1

   year   value
0  1999   0
1  2000   0
2  2001   0

>>> x2 = [1999, 2003, 2004]
>>> y2 = [5, 0, 0]
>>> df2 = pd.DataFrame({'year': x2, 'value': y2})
>>> df2

   year   value
0  1999   5
1  2003   0
2  2004   0

I want the updated data frame (df1) to look this. Is there a simple way to do this?

   year   value
0  1999   5
1  2000   0
2  2001   0
1
  • df1.value = df2.value Commented Jan 11, 2019 at 7:58

2 Answers 2

1

Using merge and map:

df = df1.merge(df2, on=['year'], how='outer')
df['max'] = df.filter(like='value').max(1)
df1['value'] = df1['year'].map(df.set_index('year')['max'])

print(df1)
   year  value
0  1999    5.0
1  2000    0.0
2  2001    0.0

EDIT : To know which rows are changed use:

#intialize the `value` column to `temp` column
df1['temp'] = df1['value']
#now use the above code to change the `value` column
#check which rows are changed with respect to `temp` column
df1['Changed_Values'] = df1['temp'].ne(df1['value'])
#finally drop temporary column
df1.drop('temp', axis=1, inplace=True)
Sign up to request clarification or add additional context in comments.

2 Comments

Hi, I have another question that maybe you can answer. Is there a way to check which values were updated? In other words, I want the same data frame to return but I want for it to print a 'True' where the values were updated.
@Jack Yes, by temporarily creating a new column for value and after changing the values, check which ones are different.
0

Why not just do:

if df1.value.sum()<df2.value.sum():
    df1.value = df2.value

Or:

if df1['value'].sum()<df2['value'].sum():
    df1['value'] = df2['value']

Now:

print(df1)

Is:

   year  value
0  1999      5
1  2000      0
2  2001      0

2 Comments

The issue with this is that I want it to only update if the df2['value'] is larger than the df1['value']
Hi, thanks for the responses. I genuinely appreciate your help but I think I see another issue with your updated response. With regards to your 2 suggestions, I'm not looking to update based on the sum of column but rather the value at each row of the column.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.