Drop rows pandas based on combination of matched column values with other dataframe

Question

Set-up

I have 2 pandas dfs (df1 and df2) which contain some overlapping rows and some non-overlapping rows.

Both dfs have the columns order_id and shop.

Now, if a row in df1 matches any row in df2 on the combination of order_id and shop, then this row should be dropped from df1. If this row doesn't match any row in df2 on order_id and shop, it should be kept.

Example

df2 is such that,

    order_id    shop
0     12345     'NL'
1     45678     'FR'
2     12345     'DE'
3     34567     'NL'

Now if df1 such that,

    order_id    shop
0     12345     'NL'
1     45678     'FR'

then df1 should return empty.

But if df1 such that,

        order_id    shop
0       12345       'NL'
1       99999       'FR'
2       12345       'UK'

then df1 should return,

        order_id    shop
0       99999       'FR'
1       12345       'UK'

Code

I created a monstrous line which then didn't really work...

So far, I have,

result_df = df1[(~df1['order_id'].astype(str).isin(df2['order_id'].astype(str)))]

How do I solve this?

jezrael · Accepted Answer · 2019-02-08 09:58:15Z

2

I think there are not same types of columns, so first convert it to string and then merge with indicator=True:

df3 = (df1.astype(str).merge(df2.astype(str), how='left', indicator=True)
          .query('_merge == "left_only"')[df1.columns])
print (df3)
   order_id  shop
2     99999  'FR'
3     12345  'UK'

Also is possible check if same dtypes before solution:

print (df1.dtypes)
print (df2.dtypes))

And convert only column(s) which is different dtype:

df2['order_id'] = df2['order_id'].astype(str)

edited Feb 8, 2019 at 9:58

answered Feb 8, 2019 at 9:55

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

LucSpan Over a year ago

I tried this, but it returns rows which are already in df1.

jezrael Over a year ago

@LucSpan - Not sure, what is problem, for me it working nice. Not swapped df1 with df2, so need

df3 = (df2.astype(str).merge(df1.astype(str), how='left', indicator=True)           .query('_merge == "left_only"')[df1.columns])

?

LucSpan Over a year ago

Ya I know. I tried yours on the example and it worked fine. Maybe it has to do with the data in the actual dfs.

jezrael Over a year ago

@LucSpan - hm, it seems some data related problem like trailing whitespaces or matching floats or similar. Then best is export values to list by df3.to_dict('list') and check if some differencies.

user12345 Over a year ago

somehow it's not working for me. instead of bringing up only non matched rows this gives every row in left dataframe.

|

Collectives™ on Stack Overflow

Drop rows pandas based on combination of matched column values with other dataframe

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related