How to delete certain rows in pandas which satisfies some condition

Question

I am using pandas and I have dataset which are looking like this:

ID-cell    TOWNS      NumberOfCrimes
 1          Paris       444
 1          Berlin      333
 1          London      111        
 2          Paris       222
 2          London      555
 2          Berlin      3
 3          Paris       999
 4          Berlin      777
 4          Paris       5
 5          Paris       123
 5          Berlin      8
 6          Paris       1000
 9          Berlin      321
 12         Berlin      1
 12         Berlin      2
 12         Paris       1

        . . .

And its a really big dataset. I need to keep for each city just 5 rows with the highest number of crimes and rest of them to delete.

So my output should look like this:

ID-cell    TOWNS      NumberOfCrimes
 6          Paris       1000
 3          Paris       999     
 1          Paris       444
 2          Paris       222
 5          Paris       123

 4          Berlin      777
 1          Berlin      333
 9          Berlin      321
 5          Berlin      8

 1          London      555        
 2          London      111

I really appreciate the help. I am new in this. And I am working some project for Faculty and my deadline is so close. :/

jpp · Accepted Answer · 2018-07-02 22:25:07Z

3

sort + groupby.head

You can sort by NumberOfCrimes descending, then use groupby + head. Here's an example with your data extracting the single highest NumberOfCrimes by Town.

res = df.sort_values('NumberOfCrimes', ascending=False)\
        .groupby('TOWNS').head(1)

print(res)

   ID-cell   TOWNS  NumberOfCrimes
5        3   Paris             999
4        2  London             555
1        1  Berlin             333

So, for the top 2 or 3 for each town, you can use head(2), head(3), etc.

edited Jul 2, 2018 at 22:25

answered Jul 2, 2018 at 22:20

jpp

166k37 gold badges301 silver badges363 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

jpp Over a year ago

@Neven, Sure, no problem. Note Wen's solution is better if you only need the top one. This one is more extendable.

Neven Over a year ago

Your solution is better for what I need, but his solution is the good one also. :)

BENY · Accepted Answer · 2018-07-02 22:23:01Z

2

Using

df.sort_values('NumberOfCrimes').drop_duplicates('ID-cell',keep='last')
Out[404]: 
   ID-cell   TOWNS  NumberOfCrimes
0        1   Paris             444
4        2  London             555
5        3   Paris             999

answered Jul 2, 2018 at 22:23

BENY

324k22 gold badges176 silver badges250 bronze badges

2 Comments

jpp Over a year ago

I like this solution is better for just keeping the top one.

Neven Over a year ago

Thank you 2 very much. :) Can I accept two answers as correct?

Collectives™ on Stack Overflow

How to delete certain rows in pandas which satisfies some condition

2 Answers 2

sort + groupby.head

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

sort + groupby.head

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related