I am having two dataframes which have exact same data structure. I need to compare them to see if they have any difference in records due to any column value being different.
I am using below code to do it and it works perfectly to report if things tie or untie between these two dataframes.
df=pd.concat([df1, df2])
df = df.reset_index(drop=True)
df_gpby = df.groupby(list(df.columns))
idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]
if df.reindex(idx).empty:
print('everything is good.')
else:
print('things do not tie out')
df.reindex(idx).to_csv('diff.csv', index=False)
Though diff.csv tells me what all is missing or is different, what it doesn't tell is which record belonged to which dataframe initially and which column values differ between the initial dataframes for a given record. Is there a way to somehow get this information in my final output ?
Sample dataframes.
Name | Age| Gender
0| Naxi | 27 | Male
1| Karan| 25 | Male
2| Tanya| 27 | Female
Name | Age| Gender
0| Naxi | 27 | Male
1| Tanya| 27 | Female
2| Karan| 24 | Male
output I want
Name | Age| Gender | Dataframe
Karan| 24 | Male | df2
Karan| 25 | Male | df1