1

I'm trying to join two DataFrames by index that can contain columns in common and I only want to add one to the other if that specific value is NaN or doesn't exist. I'm using the pandas example, so I've got:

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'C': ['C0', 'C1', 'C2', 'C3'],
                    'D': ['D0', 'D1', 'D2', 'D3']},
                    index=[0, 1, 2, 3])

as

    A   B   C   D
0  A0  B0  C0  D0
1  A1  B1  C1  D1
2  A2  B2  C2  D2
3  A3  B3  C3  D3

and

df4 = pd.DataFrame({'B': ['B2p', 'B3p', 'B6p', 'B7p'],
                    'D': ['D2p', 'D3p', 'D6p', 'D7p'],
                    'F': ['F2p', 'F3p', 'F6p', 'F7p']},
                    index=[2, 3, 6, 7])

as

    B    D    F
2  B2p  D2p  F2p
3  B3p  D3p  F3p
6  B6p  D6p  F6p
7  B7p  D7p  F7p

and the searched result is:

    A    B   C    D   F
0  A0   B0  C0   D0  Nan
1  A1   B1  C1   D1  Nan 
2  A2   B2  C2   D2  F2p
3  A3   B3  C3   D3  F3p
6 Nan  B6p Nan  D6p  F6p
7 Nan  B7p Nan  D7p  F7p

1 Answer 1

4

This is a good use case for combine_first, where the row and column indices of the resulting dataframe will be the union of the two, i.e in the absence of an index in one of the dataframes, the value from the other is used (same behaviour as if it contained a NaN:

df1.combine_first(df4)

    A    B    C    D    F
0   A0   B0   C0   D0  NaN
1   A1   B1   C1   D1  NaN
2   A2   B2   C2   D2  F2p
3   A3   B3   C3   D3  F3p
6  NaN  B6p  NaN  D6p  F6p
7  NaN  B7p  NaN  D7p  F7p
Sign up to request clarification or add additional context in comments.

1 Comment

interesting behaviour of .combine_first

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.