0

Suppose I have the following 2 DataFrames:

  1. df1, whose index is ['NameID', 'Date']. For example, df1 can be a panel dataset of historical salaries of employees in a company.

  2. df2, whose index is ['NameID']. For example, df2 can be a dataset of employees' birthday and SSN.

What is the most efficient way to join df1 and df2 on 'NameID' as an index on a 1:m basis? DataFrame.join() doesn't allow 1:m join. I know I can first reset_index() for both df1 and df2, and then use DataFrame.merge() to join them on columns, but I think that is not efficient.

Code:

df1 = pd.DataFrame({'NameID':['A','B','C']*3,
                    'Date':['20180801']*3+['20180802']*3+['20180803']*3,
                    'Salary':np.random.rand(9)
               })
df1 = df1.set_index(['NameID', 'Date'])
df1

NameID  Date    Salary
A   20180801    0.831064
B   20180801    0.419464
C   20180801    0.239779
A   20180802    0.500048
B   20180802    0.317452
C   20180802    0.188051
A   20180803    0.076196
B   20180803    0.060435
C   20180803    0.297118

df2 = pd.DataFrame({'NameID':['A','B','C'],                   
                    'SSN':[999,888,777]
                   })
df2 = df2.set_index(['NameID'])
df2

NameID  SSN
A       999
B       888
C       777

The result I want to get is:

NameID  Date        Salary      SSN
A       20180801    0.831064    999
A       20180802    0.500048    999
A       20180803    0.076196    999
B       20180801    0.419464    888
B       20180802    0.317452    888
B       20180803    0.060435    888
C       20180801    0.239779    777
C       20180802    0.188051    777
C       20180803    0.297118    777
4
  • 1
    It would be more helpful if you created a minimal reproducible example Commented Aug 10, 2018 at 15:22
  • 1
    Did you try merging on index ? Example, df3 = pd.merge(df1, df2, left_index=True, right_index=True) Commented Aug 10, 2018 at 15:38
  • Thanks warwick. I think I am all set now with your answer... I can't believe it is that simple. I was totally mislead by the top answer in this post and thought merge cannot be used to merge on indices... stackoverflow.com/questions/36538780/…. Please post your answer and I will select it. Commented Aug 10, 2018 at 15:42
  • No worries. Glad that it helped ! Commented Aug 10, 2018 at 15:45

3 Answers 3

2

You may want to merge.

df = pd.merge(df1, df2, on='NameID', how='left')
Sign up to request clarification or add additional context in comments.

Comments

0

See Michael B's answer, but in addition, you might also want to sort to get your requested output:

pd.merge(df1, df2, on='NameID', how='left').sort_values('SSN', ascending=False)

Comments

0

Answering on behalf of warwick12

df3 = pd.merge(df1, df2, left_index=True, right_index=True)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.