Pandas drop_duplicates with multiple conditions

Question

I have some measurement datas that need to be filtered, I read them as dataframe data, like these:

df

         RequestTime  RequestID  ResponseTime  ResponseID
0          150         14           103         101
1          150         15           110         102
2           25         16           121         103
3           25         16            97         104
4           22         16            44         105
5           19         17            44         106
6           26         18            29         106
7           30         18            29         106

and I need to use two different conditions at the same time, that is, to filter 'RequestTime' 'RequestID' and 'ResponseTime' 'ResponseID' by use drop_duplicate(subset=) at the same time. I have used follow command to get the filter results for each of the two conditions:

    >>>df[['RequestTime','RequestID','ResponseTime','ResponseID']].drop_duplicates(subset = ['ResponseTime','ResponseID'])

RequestTime  RequestID  ResponseTime  ResponseID
0          150         14           103         101
1          150         15           110         102
2           25         16           121         103
4           22         16            44         105
5           19         17            44         106
6           26         18            29         106
7           30         18            29         106
    >>>df[['RequestTime','RequestID','ResponseTime','ResponseID']].drop_duplicates(subset = ['RequestTime','RequestID'])

RequestTime  RequestID  ResponseTime  ResponseID
0          150         14           103         101
1          150         15           110         102
2           25         16           121         103
3           25         16            97         104
4           22         16            44         105
5           19         17            44         106
6           26         18            29         106

but how to combine the two conditions to drop duplicate row 3 and row 7?

Scott Boston · Accepted Answer · 2021-06-01 14:32:43Z

1

IIUC,

m = ~(df.duplicated(subset=['RequestTime','RequestID']) | df.duplicated(subset=['ResponseTime', 'ResponseID']))
df[m]

Output:

   RequestTime  RequestID  ResponseTime  ResponseID
0          150         14           103         101
1          150         15           110         102
2           25         16           121         103
4           22         16            44         105
5           19         17            44         106
6           26         18            29         106

Create a mask (boolean series) to boolean index your dataframe.

Or chain methods:

df.drop_duplicates(subset=['RequestTime', 'RequestID']).drop_duplicates(subset=['ResponseTime', 'ResponseID'])

answered Jun 1, 2021 at 14:32

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Sun Jar Over a year ago

Wow, thanks for your very quick reply. It’s the first time I came into this mask method. Very simple and useful. And chain methods also good.

Sun Jar Over a year ago

Sorry I'm really newbie, just know how to accepting answer and done. Thank you.

Collectives™ on Stack Overflow

Pandas drop_duplicates with multiple conditions

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related