I have a DataFrame that I want to merge and drop only duplicates values based on column name and row. For example, key_x and key_y has the
same values in the same row in row 0,3,10,12,15.
My DataFrame
import pandas as pd
df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'A'], 'value1': [1, 2, 3, 4]})
I have tried this code below which does work when merging but how do I drop the duplicates values based on column name if the value in the same row.
merged_df = df1.merge(df1, how='cross')
print(merged_df)
key_x value1_x key_y value1_y
0 A 1 A 1 # Duplicate A
1 A 1 B 2
2 A 1 C 3
3 A 1 A 4 # Duplicate A
4 B 2 A 1
5 B 2 B 2
6 B 2 C 3
7 B 2 A 4
8 C 3 A 1
9 C 3 B 2
10 C 3 C 3 # Duplicate C
11 C 3 A 4
12 A 4 A 1 # Duplicate A
13 A 4 B 2
14 A 4 C 3
15 A 4 A 4 # Duplicate A
I would like my result to be something like this:
key_x value1_x key_y value1_y
1 A 1 B 2
2 A 1 C 3
4 B 2 A 1
6 B 2 C 3
7 B 2 A 4
8 C 3 A 1
9 C 3 B 2
11 C 3 A 4
13 A 4 B 2
14 A 4 C 3