I have two Dataframes with rows with an identical (corresponding) index, which I want to merge. Every row has an update-time. For rows with identical indexes the row with the higher update-time wins. All fields from the ‘newer’ row should be taken, except for the fields were only in the ‘older’ row are values. Example:
df1 = pd.DataFrame({'Hugo' : {'age' : 21, 'weight' : 75},
'Niklas': {'age' : 46, 'weight' : 65},
'Ronald' : {'age' : 76, 'weight' : 85, 'height' : 176}}).T
df1.index.names = ['name']
df1['update_time'] = 1
df2 = pd.DataFrame({'Hugo' : {'age' : 22, 'weight' : 77},
'Bertram': {'age' : 45, 'weight' : 65, 'height' : 190},
'Donald' : {'age' : 75, 'weight' : 85},
'Ronald' : {'age' : 77, 'weight' : 84}}).T
df2.index.names = ['name']
df2['update_time'] = 2
df1:
+--------+-------+----------+----------+---------------+
| name | age | height | weight | update_time |
|--------+-------+----------+----------+---------------|
| Hugo | 21 | nan | 75 | 1 |
| Niklas | 46 | nan | 65 | 1 |
| Ronald | 76 | 176 | 85 | 1 |
+--------+-------+----------+----------+---------------+
df2:
+---------+-------+----------+---------------+
| name | age | weight | update_time |
|---------+-------+----------+---------------|
| Bertram | 45 | 65 | 2 |
| Donald | 75 | 85 | 2 |
| Hugo | 22 | 77 | 2 |
| Ronald | 77 | 84 | 2 |
+---------+-------+----------+---------------+
Result should look like this:
+---------+-------+----------+----------+---------------+
| name | age | height | weight | update_time |
|---------+-------+----------+----------+---------------|
| Niklas | 46 | nan | 65 | 1 |
| Bertram | 45 | 190 | 65 | 2 |
| Donald | 75 | nan | 85 | 2 |
| Hugo | 22 | nan | 77 | 2 |
| Ronald | 77 | 176 | 84 | 2 |
+---------+-------+----------+----------+---------------+
How could I do this ? The Problem is to keep the field with the height of Ronald. If I do first an df.Update of df1 then the timestamp isn't there anymore and i cannot find the older duplicates. If I do an df.append I can't merge the fields.