2

How can I merge 2 dataframe df1 and df2 in order to get df3 that has the rows of df1 and df2 that have the same index (and the same values in the columns)?

df1 = pd.DataFrame({'A': ['A0', 'A2', 'A3', 'A7'],
                        'B': ['B0', 'B2', 'B3', 'B7'],
                        'C': ['C0', 'C2', 'C3', 'C7'],
                        'D': ['D0', 'D2', 'D3', 'D7']},
                         index=[0, 2, 3,7])

test 1

df2 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A7'],
                    'B': ['B0', 'B1', 'B2', 'B7'],
                    'C': ['C0', 'C1', 'C2', 'C7'],
                    'D': ['D0', 'D1', 'D2', 'D7']},
                     index=[0, 1, 2, 7])

test 2

df2 = pd.DataFrame({'A': ['A1'],
                    'B': ['B1'],
                    'C': ['C1'],
                    'D': ['D1']},
                     index=[1])

Expected output test 1

Out[13]: 
    A   B   C   D
0  A0  B0  C0  D0
2  A2  B2  C2  D2
7  A7  B7  C7  D7

Expected output test 2

Empty DataFrame
Columns: [A, B, C, D]
Index: []

2 Answers 2

2

Just merge:

In[111]:
df1.merge(df2)

Out[111]: 
    A   B   C   D
0  A0  B0  C0  D0

The default params for merge is to merge all columns, performing an inner merge so only where all values agree

Looking at the index matching requirement, I'd filter the df prior to the merge:

In[131]:
filtered = df1.loc[df2.index].dropna()
filtered

Out[131]: 
    A   B   C   D
1  A1  B1  C1  D1

and then merge

In[132]:
filtered.merge(df2)
Out[132]: 
    A   B   C   D
0  A0  B0  C0  D0

if the indices do not match at all, say the first row of df2 is 1 instead of 2:

In[133]:
filtered = df1.loc[df2.index].dropna()
filtered
Out[133]: 
    A   B   C   D
1  A1  B1  C1  D1

then merge will return an empty df because the index row value doesn't agree:

In[134]:
filtered.merge(df2)

Out[132]: 
Empty DataFrame
Columns: [A, B, C, D]
Index: []

UPDATE

On your new dataset, merge will reset the index which is the default behaviour:

In[152]:
filtered.merge(df2)

Out[152]: 
    A   B   C   D
0  A0  B0  C0  D0
1  A2  B2  C2  D2
2  A7  B7  C7  D7

So to retain the indices, we can just make a boolean mask using the equality operator and call dropna so that any rows with any NaN values which will occur where the values don't agree will get dropped, this should handle all cases:

In[153]:
filtered[filtered== df2.loc[filtered.index]].dropna()

Out[153]: 
    A   B   C   D
0  A0  B0  C0  D0
2  A2  B2  C2  D2
7  A7  B7  C7  D7
Sign up to request clarification or add additional context in comments.

13 Comments

I've edited the question with a different test case... not sure the solution that you gave works for this case..
It gives the right rows but doesn't preserve the indices, it seems.
@cᴏʟᴅsᴘᴇᴇᴅ indices are ignored when merging on columns, semantically I think merge is the correct approach, I'll update to show how to get this to work as posted
@EdChum the problem of this solution is that if there are not common indeces the merge will give a warining... any idea how to solve it?, I added the test case in the question
actually it gives an error KeyError: "None of [Int64Index([1], dtype='int64')] are in the [index]"
|
1

If you are sure that the values are the same you can do:

df1.loc[df1.index.to_series().isin(df2.index)]

Theres no need to do a merge.

2 Comments

Semantically this is just matching on indices and not column or column values which is not what the OP's question is about
Well, thats exactly what he asked. He doesn't want to merge data from two dataframes, he just wants to filter based on the index.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.