Pandas merge adding column

Question

I have two dataframes df1 and df2. df1 contains the columns subject_id and time and df2 contains the columns subject_id and final_time. What I want to do is for every subject_id in df1 add a column with final_time from df2 but only from the subject_ids's contained in df1. I have tried df1.merge(df2,how='left') but still get all of the subject_id's from df2 which is much longer and contains many duplicates of 'subject_id`.

Example of what I am looking for:

df1


    subject_id  time  

 0          15  12:00 
 1          20  12:05 
 2          21  12:10
 3          25  12:00 

df2


    subject_id  final_time 

 0          15       12:30      
 1          15       12:30      
 2          15       12:30      
 3          20       12:45      
 4          20       12:45      
 5          21       12:50      
 6          25        1:00       
 7          25        1:00       
 8          25        1:00

What I am looking for

    subject_id  time  final_time 

 0          15  12:00      12:30      
 1          20  12:05      12:45      
 2          21  12:10      12:50      
 3          25  12:00       1:00

Can you show example code, and what the intended output would be? — SeeDerekEngineer
– SeeDerekEngineer, Commented Jul 31, 2017 at 16:07

dugup · Accepted Answer · 2017-07-31 16:03:57Z

6

You should use

df1.merge(df2, on='subject_id')

The default for how is inner, which will only match those entries that are in both columns. on tells the merge to match only on the column you are interested in

answered Jul 31, 2017 at 16:03

dugup

4264 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

GNMO11 Over a year ago

Just edited my questions, df2 contains duplicates of subject_id which I think is causing this to give the same results as I was getting with df1.merge(df2,how='left')

dugup Over a year ago

If you only want the unique rows from df2 joined to the matching rows on df1 try dropping the duplicates from df2 first. So df1.merge(df2.drop_duplicates())

GNMO11 Over a year ago

.drop_duplicates() looks to be what I was looking for, thanks

piRSquared · Accepted Answer · 2017-07-31 16:14:41Z

0

Works for me. Nothing in results that aren't in df1

df1 = pd.DataFrame(dict(subject_id=[1, 2, 3], time=[9, 8, 7]))
df2 = pd.DataFrame(dict(subject_id=[2, 2, 4], final_time=[6, 5, 4]))

df1.merge(df2, 'left')

   subject_id  time  final_time
0           1     9         NaN
1           2     8         6.0
2           2     8         5.0
3           3     7         NaN

answered Jul 31, 2017 at 16:14

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Collectives™ on Stack Overflow

Pandas merge adding column

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related