I have two DataFrames: df1 and df2.
I would like to find all the rows in these combined DataFrames that have identical values in 'columnA' (object) and 'columnB' (int). These rows will have differing values in other columns I don't care about. The shape of these DataFrames also differs.
I've tried something like:
concat = pd.concat([df1, df2])
overlap = concat[concat.duplicated(subset=['columnA','columnB'], keep=False)]
But the output doesn't look right (maybe it is). Just want to check - am I missing anything?
Edit:
Say I wanted all the rows with the same value in columnA but different values in columnB - would this work?
df3 = (concat[concat.duplicated(subset=['columnA'], keep=False)]
.drop_duplicates(subset=['columnB']))
pd.merge()?columnA_df == columnA_df2butcolumnB_df == columnB_df2, drop row?columnAandcolumnB. Separately, in separate output, all rows with identical values incolumnAbut different values incolumnB.