I have a dataframe with about a half a million rows. As I could see, there are plenty of duplicate rows, so how can I drop duplicate rows that have the same value in all of the columns (about 80 columns), not just one?
df:
period_start_time id val1 val2 val3
06.13.2017 22:00:00 i53 32 2 10
06.13.2017 22:00:00 i32 32 2 10
06.13.2017 22:00:00 i32 4 2 8
06.13.2017 22:00:00 i32 4 2 8
06.13.2017 22:00:00 i32 4 2 8
06.13.2017 22:00:00 i20 7 7 22
06.13.2017 22:00:00 i20 7 7 22
Desired output:
period_start_time id val1 val2 val3
06.13.2017 22:00:00 i53 32 2 10
06.13.2017 22:00:00 i32 32 2 10
06.13.2017 22:00:00 i32 4 2 8
06.13.2017 22:00:00 i20 7 7 22
df.drop_duplicates()?