Pandas merge on multiple columns ignoring NaN

Ask Question

Asked 6 years, 4 months ago

Modified 6 years, 4 months ago

Viewed 489 times

I am trying to do the same as this answer, but with the difference that I'd like to ignore NaN in some cases. For instance:

#df1
     c1    c2    c3
0    a     b     1
1    a     c     2
2    a     nan   1
3    b     nan   3
4    c     d     1
5    d     e     3

#df2
     c1    c2    c4
0    a     nan   1
1    a     c     2
2    a     x     1
3    b     nan   3
4    z     y     2

#merged output based on [c1, c2], dropping instances 
#with `NaN` unless both dataframes have `NaN`.

     c1    c2    c3    c4
0    a     b     1     1   #c1,c2 from df1 because df2 has a nan in c2
1    a     c     2     2   #in both
2    a     x     1     1   #c1,c2 from df2 because df1 has a nan in c2
3    b     nan   3     3   #c1,c2 as found in both
4    c     d     1     nan #from df1
5    d     e     3     nan #from df1
6    z     y     nan   2   #from df2

NaNs may come from either c1 or c2, but for this example I kept it simpler.

I'm not sure what's the cleanest way to do this. I was thinking to merge based on [c1,c2], and then loop by rows with nan, but this will not be so direct. Do you see a better way to do it?

Edit - clarifying conditions
1. No duplicates are found anywhere.
2. No combination is performed between two rows if they both have values. c1 may not be combined with c2, so order must be respected.
3. For the cases where one of the 2 dfs has a nan in either c1 or c2, find the rows in the other dataframe that don't have a full match on both c1+c2, and use it. For instance:

(a,c) has a match in both so it is no longer discussed.
(a,b) is only in df1. No b is found in df2.c2. The only row in df2 with a known key and a nan is row 0 so it is combined with this one. Note that order must be respected this is why (a,b) #df1 cannot be combined with any other row of df2 that also contains a b.
(a,x) is only in df2. No x is found in df1.c2. The only row in df1 with one of the known keys with a nan is row with index 2.

edited Jul 25, 2019 at 13:30

asked Jul 25, 2019 at 11:40

Sos

1,9592 gold badges27 silver badges54 bronze badges

df1.combine_first(df2)

anky
– anky

2019-07-25 11:47:29 +00:00
Commented Jul 25, 2019 at 11:47
If your data is alligned as your example data, you can simply use: df1['c2'] = df1['c2'].fillna(df2['c2'])

Erfan
– Erfan

2019-07-25 11:47:36 +00:00
Commented Jul 25, 2019 at 11:47
Let me know if your data is alligned as you show in your question. If this is not the case neither mine or anky's answer will work. I will vote for reopen.

Erfan
– Erfan

2019-07-25 11:49:33 +00:00
Commented Jul 25, 2019 at 11:49
1

You have to test yourself, the rows have to be aligned as you show in your example dataset. If a sort will fix this, that means your data has the same amount of rows, plus the amount of keysL a, b etc.

Erfan
– Erfan

2019-07-25 11:59:49 +00:00
Commented Jul 25, 2019 at 11:59
2

your edit to the question introduces some major additional problems. Your expected output no longer makes sense in terms of a "merge" context, because if you were using merge, both tables have a ('a', nan) row, and shouldn't that get paired up? I think you really need to clarify for yourself some restrictions on how this would go, otherwise I don't see you asking for "one solution". There's consistency issues. Simply put, why should the 3rd row of df2 get combined with the 3rd row of df1? The first row of df2 matches much better with df1

Paritosh Singh
– Paritosh Singh

2019-07-25 12:39:55 +00:00
Commented Jul 25, 2019 at 12:39

| Show 5 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Pandas merge on multiple columns ignoring NaN

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked