1

I have some measurement datas that need to be filtered, I read them as dataframe data, like these:

df

         RequestTime  RequestID  ResponseTime  ResponseID
0          150         14           103         101
1          150         15           110         102
2           25         16           121         103
3           25         16            97         104
4           22         16            44         105
5           19         17            44         106
6           26         18            29         106
7           30         18            29         106

and I need to use two different conditions at the same time, that is, to filter 'RequestTime' 'RequestID' and 'ResponseTime' 'ResponseID' by use drop_duplicate(subset=) at the same time. I have used follow command to get the filter results for each of the two conditions:

    >>>df[['RequestTime','RequestID','ResponseTime','ResponseID']].drop_duplicates(subset = ['ResponseTime','ResponseID'])

RequestTime  RequestID  ResponseTime  ResponseID
0          150         14           103         101
1          150         15           110         102
2           25         16           121         103
4           22         16            44         105
5           19         17            44         106
6           26         18            29         106
7           30         18            29         106
    >>>df[['RequestTime','RequestID','ResponseTime','ResponseID']].drop_duplicates(subset = ['RequestTime','RequestID'])

RequestTime  RequestID  ResponseTime  ResponseID
0          150         14           103         101
1          150         15           110         102
2           25         16           121         103
3           25         16            97         104
4           22         16            44         105
5           19         17            44         106
6           26         18            29         106

but how to combine the two conditions to drop duplicate row 3 and row 7?

1 Answer 1

1

IIUC,

m = ~(df.duplicated(subset=['RequestTime','RequestID']) | df.duplicated(subset=['ResponseTime', 'ResponseID']))
df[m]

Output:

   RequestTime  RequestID  ResponseTime  ResponseID
0          150         14           103         101
1          150         15           110         102
2           25         16           121         103
4           22         16            44         105
5           19         17            44         106
6           26         18            29         106

Create a mask (boolean series) to boolean index your dataframe.


Or chain methods:

df.drop_duplicates(subset=['RequestTime', 'RequestID']).drop_duplicates(subset=['ResponseTime', 'ResponseID'])
Sign up to request clarification or add additional context in comments.

2 Comments

Wow, thanks for your very quick reply. It’s the first time I came into this mask method. Very simple and useful. And chain methods also good.
Sorry I'm really newbie, just know how to accepting answer and done. Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.