2

Set-up

I have 2 pandas dfs (df1 and df2) which contain some overlapping rows and some non-overlapping rows.

Both dfs have the columns order_id and shop.

Now, if a row in df1 matches any row in df2 on the combination of order_id and shop, then this row should be dropped from df1. If this row doesn't match any row in df2 on order_id and shop, it should be kept.


Example

df2 is such that,

    order_id    shop
0     12345     'NL'
1     45678     'FR'
2     12345     'DE'
3     34567     'NL'

Now if df1 such that,

    order_id    shop
0     12345     'NL'
1     45678     'FR'

then df1 should return empty.


But if df1 such that,

        order_id    shop
0       12345       'NL'
1       99999       'FR'
2       12345       'UK'

then df1 should return,

        order_id    shop
0       99999       'FR'
1       12345       'UK'

Code

I created a monstrous line which then didn't really work...

So far, I have,

result_df = df1[(~df1['order_id'].astype(str).isin(df2['order_id'].astype(str)))]

How do I solve this?

1 Answer 1

2

I think there are not same types of columns, so first convert it to string and then merge with indicator=True:

df3 = (df1.astype(str).merge(df2.astype(str), how='left', indicator=True)
          .query('_merge == "left_only"')[df1.columns])
print (df3)
   order_id  shop
2     99999  'FR'
3     12345  'UK'

Also is possible check if same dtypes before solution:

print (df1.dtypes)
print (df2.dtypes))

And convert only column(s) which is different dtype:

df2['order_id'] = df2['order_id'].astype(str)
Sign up to request clarification or add additional context in comments.

7 Comments

I tried this, but it returns rows which are already in df1.
@LucSpan - Not sure, what is problem, for me it working nice. Not swapped df1 with df2, so need df3 = (df2.astype(str).merge(df1.astype(str), how='left', indicator=True) .query('_merge == "left_only"')[df1.columns]) ?
Ya I know. I tried yours on the example and it worked fine. Maybe it has to do with the data in the actual dfs.
@LucSpan - hm, it seems some data related problem like trailing whitespaces or matching floats or similar. Then best is export values to list by df3.to_dict('list') and check if some differencies.
somehow it's not working for me. instead of bringing up only non matched rows this gives every row in left dataframe.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.