1

I have two dataframes that I want to merge / join on multiple index columns and I am unsure how to do it.

My first dataframe looks as follows:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'Type': np.repeat(['A', 'B', 'C'], 6),
                    'Date': np.repeat(pd.date_range(start='2014-01-01', end='2014-01-30', freq='3D')[:-1], 2),
                    'var1': np.random.randn(18),
                    'var2': np.random.randn(18)})
df1
Out[1]:

    Type    Date    var1    var2
0   A   2014-01-01  1.158068    -0.010431
1   A   2014-01-01  0.052350    0.779531
2   A   2014-01-04  0.321079    -2.103932
3   A   2014-01-04  -0.796583   0.528920
4   A   2014-01-07  -0.868004   -0.170354
5   A   2014-01-07  -1.090472   -0.116811
6   B   2014-01-10  -0.659798   3.390008
7   B   2014-01-10  -1.054428   0.016943
8   B   2014-01-13  0.853301    0.292624
9   B   2014-01-13  0.814400    -1.213494
10  B   2014-01-16  0.307140    1.866017
11  B   2014-01-16  0.652235    0.431757
12  C   2014-01-19  1.030088    0.646554
13  C   2014-01-19  0.163201    0.974505
14  C   2014-01-22  0.517568    -0.197154
15  C   2014-01-22  -1.252889   0.335804
16  C   2014-01-25  -1.670459   1.335267
17  C   2014-01-25  -0.504036   -1.966568

The dated in the Date column here are not unique, they can occur multiple times even within the same Type.

My second one is like a "lookup table" to fill in more variables into df1. Dates are unique within Type here (But one date occurs for each type). Here it is:

dates = df1.Date.unique()

df2 = pd.DataFrame({'Type': np.repeat(['A', 'B', 'C'], len(dates)),
                    'Date': np.tile(dates, 3),
                    'var3': np.random.randn(len(dates)*3),
                    'var4': np.random.randn(len(dates)*3)})
df2
Out[2]:
    Type    Date    var3    var4
0   A   2014-01-01  -1.174070   -0.385655
1   A   2014-01-04  0.297007    0.993622
2   A   2014-01-07  0.022896    0.273704
3   A   2014-01-10  -1.392180   -1.100206
4   A   2014-01-13  -0.681359   2.784101
5   A   2014-01-16  0.228933    0.699884
6   A   2014-01-19  -1.048384   0.649192
7   A   2014-01-22  -1.873971   2.479376
8   A   2014-01-25  0.765804    0.579625
9   B   2014-01-01  -0.274890   0.000616
10  B   2014-01-04  -0.962112   -0.165101
11  B   2014-01-07  1.547124    -0.988169
12  B   2014-01-10  0.140017    2.038517
13  B   2014-01-13  0.436132    -0.727510
14  B   2014-01-16  0.490690    -0.461119
15  B   2014-01-19  0.660947    0.962961
16  B   2014-01-22  -1.153902   0.436929
17  B   2014-01-25  -0.171737   -2.704890
18  C   2014-01-01  0.153577    0.261392
19  C   2014-01-04  0.367761    0.899686
20  C   2014-01-07  -1.766651   0.286800
21  C   2014-01-10  -0.536036   0.322947
22  C   2014-01-13  -1.517577   3.400519
23  C   2014-01-16  0.271890    0.460464
24  C   2014-01-19  1.590970    0.380616
25  C   2014-01-22  0.693004    -0.647738
26  C   2014-01-25  0.797017    0.642877

So what I want to do now is:

  1. Take df1, for each row:
  2. Look up the corresponding ['Type', 'Date']-Combination in df2
  3. Fill in the respective var3 and var4 values in df1.

I normally do these kinds of operations with a df1.join(df2, on=['Type', 'Date']). But for some reason it throws

ValueError: len(left_on) must equal the number of levels in the index of "right"

Not sure what the p roblem is here, or how to do this in an easier fashion

1 Answer 1

3

you may be looking for merge

df2.merge(df1, on = ['Type', 'Date'], how = 'right')

It's more covenient to use merge when you are not joining on index, check out this response for more details

Sign up to request clarification or add additional context in comments.

2 Comments

ah okay, yeah, that works. I've also got it to work when I .set_index(['Type', 'Date']) for each of the DataFrames before joining. But I guess merge is much more convenient then. I still have to check if everything merged the way I want it, but it looks good so far. Thanks. I imagined it to be more complicated :D
Glad I've been of help

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.