I'm working on a data that contains duplicates. If "similarity_index" of the row is equal to another row, that means they are duplicates. I'm trying to merge this duplicates.
Here is my DataFrame:
ad soyad similarity_index
0 hakan özdemir 0
1 hasan yaman 1
2 naci şenli 2
3 naciye şen 2
4 osman uygur 3
5 elif sözen 4
6 irem derici 5
Here is what I tried to do:
test_df.set_index("similarity_index").sort_index()
Here is the output:
ad soyad
similarity_index
0 hakan özdemir
0 hakan utku özdemir
1 hasan yaman
2 naci şenli
2 naciye şen
3 osman uygur
4 elif sözen
5 irem derici
5 irem delici
6 hako özdemir
Here is what I want:
ad soyad
similarity_index
0 hakan özdemir
hakan utku özdemir
1 hasan yaman
2 naci şenli
naciye şen
3 osman uygur
4 elif sözen
5 irem derici
irem delici
6 hako özdemir
With this I'm trying to accomplish selecting duplicate rows with the same index. I tried groupby() and pivot_table(). But I couldn't find a proper way to do it.

