pandas left join where right is null on multiple columns

Question

I have two pandas df x and y, both with the same 3 columns A B C (not nullable). I need to create a new df z, obtained by "subtracting from x the rows which are entirely identical to the rows of y", i.e. a

x left join y on x.A=y.A and x.B=y.B and x.C=y.C
where y.A is null

How would I do that? Got stuck with indexes, concat, merge, join, ...

Example:

dataframe x
A    B    C
q1   q2   q3
q4   q2   q3
q7   q2   q9

dataframe y
A    B    C
q4   q2   q3

dataframe z
A    B    C
q1   q2   q3
q7   q2   q9

Can you create data sample and expected output?

jezrael
– jezrael

2018-03-26 08:42:41 +00:00
Commented Mar 26, 2018 at 8:42 — jezrael
– jezrael, Commented Mar 26, 2018 at 8:42
Sure, example added

edoedoedo
– edoedoedo

2018-03-26 08:48:10 +00:00
Commented Mar 26, 2018 at 8:48 — edoedoedo
– edoedoedo, Commented Mar 26, 2018 at 8:48

Josiah Yoder · Accepted Answer · 2022-08-09 16:30:58Z

27

I think need merge with indicator and filter only rows from left DataFrame:

df = x.merge(y, indicator='i', how='outer').query('i == "left_only"').drop('i', axis=1)
print (df)
    A   B    C
0  q1  q2   q3
2  q7  q2  q93

In earlier versions of pandas, it may be necessary to replace .drop('i', axis=1) with .drop('i',1). The former is necessary to avoid warnings in later versions of Pandas.

edited Aug 9, 2022 at 16:30

Josiah Yoder

3,9264 gold badges49 silver badges72 bronze badges

answered Mar 26, 2018 at 8:58

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

rhug123 · Accepted Answer · 2023-01-20 20:36:45Z

2

Here are a few other ways to remove certain lines from a dataframe using another dataframe:

pd.concat([dfx,dfy]).drop_duplicates(keep=False)

or

dfx.loc[[i not in dfy.to_records(index = False) for i in dfx.to_records(index = False)]]

or

dfx.loc[~dfx.apply(tuple,axis=1).isin(dfy.to_records(index = False))]

or

pd.MultiIndex.from_frame(dfx).symmetric_difference(pd.MultiIndex.from_frame(dfy)).to_frame().reset_index(drop=True)

pd.DataFrame(set(dfx.apply(tuple,axis=1)).symmetric_difference(dfy.apply(tuple,axis=1)))

edited Jan 20, 2023 at 20:36

answered Aug 4, 2022 at 19:59

rhug123

8,8801 gold badge14 silver badges27 bronze badges

Collectives™ on Stack Overflow

pandas left join where right is null on multiple columns

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related