2

I have two dataframes df1 and df2. df1 contains the columns subject_id and time and df2 contains the columns subject_id and final_time. What I want to do is for every subject_id in df1 add a column with final_time from df2 but only from the subject_ids's contained in df1. I have tried df1.merge(df2,how='left') but still get all of the subject_id's from df2 which is much longer and contains many duplicates of 'subject_id`.

Example of what I am looking for:

df1


    subject_id  time  

 0          15  12:00 
 1          20  12:05 
 2          21  12:10
 3          25  12:00 

df2


    subject_id  final_time 

 0          15       12:30      
 1          15       12:30      
 2          15       12:30      
 3          20       12:45      
 4          20       12:45      
 5          21       12:50      
 6          25        1:00       
 7          25        1:00       
 8          25        1:00      

What I am looking for

    subject_id  time  final_time 

 0          15  12:00      12:30      
 1          20  12:05      12:45      
 2          21  12:10      12:50      
 3          25  12:00       1:00       
1
  • Can you show example code, and what the intended output would be? Commented Jul 31, 2017 at 16:07

2 Answers 2

6

You should use

df1.merge(df2, on='subject_id')

The default for how is inner, which will only match those entries that are in both columns. on tells the merge to match only on the column you are interested in

Sign up to request clarification or add additional context in comments.

3 Comments

Just edited my questions, df2 contains duplicates of subject_id which I think is causing this to give the same results as I was getting with df1.merge(df2,how='left')
If you only want the unique rows from df2 joined to the matching rows on df1 try dropping the duplicates from df2 first. So df1.merge(df2.drop_duplicates())
.drop_duplicates() looks to be what I was looking for, thanks
0

Works for me. Nothing in results that aren't in df1

df1 = pd.DataFrame(dict(subject_id=[1, 2, 3], time=[9, 8, 7]))
df2 = pd.DataFrame(dict(subject_id=[2, 2, 4], final_time=[6, 5, 4]))

df1.merge(df2, 'left')

   subject_id  time  final_time
0           1     9         NaN
1           2     8         6.0
2           2     8         5.0
3           3     7         NaN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.