1

I have two Dataframes with rows with an identical (corresponding) index, which I want to merge. Every row has an update-time. For rows with identical indexes the row with the higher update-time wins. All fields from the ‘newer’ row should be taken, except for the fields were only in the ‘older’ row are values. Example:

df1 = pd.DataFrame({'Hugo' : {'age' : 21, 'weight' : 75},
                   'Niklas': {'age' : 46, 'weight' : 65},
                   'Ronald' : {'age' : 76, 'weight' : 85, 'height' : 176}}).T
df1.index.names = ['name']
df1['update_time'] = 1

df2 = pd.DataFrame({'Hugo' : {'age' : 22, 'weight' : 77},
                   'Bertram': {'age' : 45, 'weight' : 65, 'height' : 190},
                   'Donald' : {'age' : 75, 'weight' : 85},
                   'Ronald' : {'age' : 77, 'weight' : 84}}).T
df2.index.names = ['name']
df2['update_time'] = 2


df1:
+--------+-------+----------+----------+---------------+
| name   |   age |   height |   weight |   update_time |
|--------+-------+----------+----------+---------------|
| Hugo   |    21 |      nan |       75 |             1 |
| Niklas |    46 |      nan |       65 |             1 |
| Ronald |    76 |      176 |       85 |             1 |
+--------+-------+----------+----------+---------------+
df2:
+---------+-------+----------+---------------+
| name    |   age |   weight |   update_time |
|---------+-------+----------+---------------|
| Bertram |    45 |       65 |             2 |
| Donald  |    75 |       85 |             2 |
| Hugo    |    22 |       77 |             2 |
| Ronald  |    77 |       84 |             2 |
+---------+-------+----------+---------------+

Result should look like this:

+---------+-------+----------+----------+---------------+
| name    |   age |   height |   weight |   update_time |
|---------+-------+----------+----------+---------------|
| Niklas  |    46 |      nan |       65 |             1 |
| Bertram |    45 |      190 |       65 |             2 |
| Donald  |    75 |      nan |       85 |             2 |
| Hugo    |    22 |      nan |       77 |             2 |
| Ronald  |    77 |      176 |       84 |             2 |
+---------+-------+----------+----------+---------------+

How could I do this ? The Problem is to keep the field with the height of Ronald. If I do first an df.Update of df1 then the timestamp isn't there anymore and i cannot find the older duplicates. If I do an df.append I can't merge the fields.

2
  • 1st , your input dose not match your output ,2nd you should search before ask . Commented Apr 9, 2018 at 18:08
  • When I use combine_first (which I didn't know and I searched al lot before asking) I get exact the upper output with the upper input. Commented Apr 9, 2018 at 18:54

2 Answers 2

6

Use combine_first:

df2.combine_first(df1)

output:

          age  height  weight  update_time
name                                      
Bertram  45.0   190.0    65.0          2.0
Donald   75.0     NaN    85.0          2.0
Hugo     22.0     NaN    77.0          2.0
Niklas   46.0     NaN    65.0          1.0
Ronald   77.0   176.0    84.0          2.0
Sign up to request clarification or add additional context in comments.

1 Comment

This is what I have been looking for for some time. My solution was far more clunky and I'm sure going to make some trouble.
0

I am putting my answer here as parity, being another working solution I have used primarily:

rdpz = lambda df: df.groupby(df.index).last()
df3 = rdpz( pd.concat([df1, df2]) )

Here, the function rdpz groups the concatenated list of dataframes by their indices and chooses the last entry (first can be substituted for last if list order is reversed).

As said previously, this is not nearly as clean as Scott Boston's answer, but helpful if you want to extend to multiple dataframes or have more tricky multiindexes. In one line, you can use:

df3 = pd.concat([df1, df2]).apply(lambda df: df.groupby(df.index).last())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.