I want to remove rows that are in one dataframe, if another dataframe has the same rows. However, I don't want to remove all the rows, only the number of rows that are in the other dataframe. Refer to this example:
df1
col1 col2
0 1 10
1 1 10
2 2 11
3 3 12
4 1 10
df2
col1 col2
0 1 10
1 2 11
2 1 10
3 3 12
4 3 12
Desired output:
df1
col1 col2
1 10
Because df1 has 3 rows of 1,10 while df2 has 2 rows of 1,10 so you remove 2 from each, leaving 1 for df1. If there were 4 rows in df1, I would want two rows of 1,10 in df1 as a result. Same with df2 below:
df2
col1 col2
3 12
My attempt:
I was maybe thinking of counting how many duplicates are in each of the dataframe and creating new df1 and df2 by subtracting the dupe_count but wondering if there's a more efficient way.
df1g=df1.groupby(df1.columns.tolist(),as_index=False).size().reset_index().rename(columns={0:'dupe_count'})
df2g=df2.groupby(df2.columns.tolist(),as_index=False).size().reset_index().rename(columns={0:'dupe_count'})