0

I have a Dataframe that looks like following:

UID Count1 Count2
XXX 1 1
Xyy 1 0
yyy 0 2

I want to sort that Dataframe based on the two Count-Columns. The order of the columns should not matter, plus equally distributed values should be on top.

It should look like this:

UID Count1 Count2
XXy 1 1
yyy 0 2
Xyy 1 0

Is there a way to achieve this with Pandas?

2
  • Can you explain your desired sorting order a bit more? ascending/descending? Commented Mar 15, 2021 at 11:28
  • I want it descending. so a entry with Count1 = 2 and Count 2 = 2 should be higher than one with C1 =1 and C2 = 2. C1 = 3 & C2 = 1 should be below C1 =2 & C2=2, because the values are not as equally distributed Commented Mar 15, 2021 at 11:50

1 Answer 1

1

You could sort using standard deviation and sum, putting the highest sum and the lowest standard deviation at the top


import numpy as np
import pandas as pd

arr = np.array([[1,0,1,2],[1,2,0,0]])    
        
df = pd.DataFrame(arr).T
df.columns = (["Count1","Count2"])

#Get sum of columns
df['sum'] = df.sum(axis=1)
#Get standard deviation of columns
df['sd'] = df.std(axis=1)

#Sort by SD ascending and SUM descending
df_sorted = df.sort_values(by=['sd'], ascending=True).sort_values(by=['sum'], ascending=False)

#df_sorted
#Out[87]: 
#  Count1  Count2  sum        sd
#0       1       1    2  0.577350
#1       0       2    2  1.154701
#3       2       0    2  1.154701
#2       1       0    1  0.577350

#Take the first two columns
df = df_sorted[['Count1', 'Count2']]

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.