3

I am using pandas and I have dataset which are looking like this:

ID-cell    TOWNS      NumberOfCrimes
 1          Paris       444
 1          Berlin      333
 1          London      111        
 2          Paris       222
 2          London      555
 2          Berlin      3
 3          Paris       999
 4          Berlin      777
 4          Paris       5
 5          Paris       123
 5          Berlin      8
 6          Paris       1000
 9          Berlin      321
 12         Berlin      1
 12         Berlin      2
 12         Paris       1

        . . .

And its a really big dataset. I need to keep for each city just 5 rows with the highest number of crimes and rest of them to delete.

So my output should look like this:

ID-cell    TOWNS      NumberOfCrimes
 6          Paris       1000
 3          Paris       999     
 1          Paris       444
 2          Paris       222
 5          Paris       123

 4          Berlin      777
 1          Berlin      333
 9          Berlin      321
 5          Berlin      8

 1          London      555        
 2          London      111

I really appreciate the help. I am new in this. And I am working some project for Faculty and my deadline is so close. :/

2 Answers 2

3

sort + groupby.head

You can sort by NumberOfCrimes descending, then use groupby + head. Here's an example with your data extracting the single highest NumberOfCrimes by Town.

res = df.sort_values('NumberOfCrimes', ascending=False)\
        .groupby('TOWNS').head(1)

print(res)

   ID-cell   TOWNS  NumberOfCrimes
5        3   Paris             999
4        2  London             555
1        1  Berlin             333

So, for the top 2 or 3 for each town, you can use head(2), head(3), etc.

Sign up to request clarification or add additional context in comments.

2 Comments

@Neven, Sure, no problem. Note Wen's solution is better if you only need the top one. This one is more extendable.
Your solution is better for what I need, but his solution is the good one also. :)
2

Using

df.sort_values('NumberOfCrimes').drop_duplicates('ID-cell',keep='last')
Out[404]: 
   ID-cell   TOWNS  NumberOfCrimes
0        1   Paris             444
4        2  London             555
5        3   Paris             999

2 Comments

I like this solution is better for just keeping the top one.
Thank you 2 very much. :) Can I accept two answers as correct?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.