4

I have a dataframe that looks like this

         col0         col1  col2   col4
         1    '1ZE7999'  865545   20    20
         2    'R022428'  865584  297     0
         3    34         865665  296     0 
         4    56         865700  297     0
         5    100        865628  292     5

I want to sort it by 'col0', first the numerical values, then the strings, the way that Excel sorts

       col0         col1  col2   col4
  3    34         865665  296     0 
  4    56         865700  297     0
  5    100        865628  292     5
  1    '1ZE7999'  865545   20    20
  2    'R022428'  865584  297     0

I used

df.sort_values(by='col1', ascending=True)

But that does not sort it that way, it sorts it from 0-9 then a-z

      col0         col1  col2   col4
 1    '1ZE7999'  865545   20    20
 5    100        865628  292     5
 3    34         865665  296     0 
 4    56         865700  297     0
 2    'R022428'  865584  297     0
2
  • Possible duplicate of how to sort descending an alphanumeric pandas index. Commented Dec 20, 2017 at 20:33
  • Hmm, do the strings need to be sorted as well? Or is it okay if they just come after the numbers? Commented Dec 20, 2017 at 21:32

3 Answers 3

10

pd.to_numeric + sort_values + loc -

df.loc[pd.to_numeric(df.col0, errors='coerce').sort_values().index]

        col0    col1  col2  col4
3         34  865665   296     0
4         56  865700   297     0
5        100  865628   292     5
1  '1ZE7999'  865545    20    20
2  'R022428'  865584   297     0

Details

pd.to_numeric coerces non-integral values to NaN -

i = pd.to_numeric(df.col0, errors='coerce')
i

1      NaN
2      NaN
3     34.0
4     56.0
5    100.0
Name: col0, dtype: float64

sort_values sorts the column, ignoring NaNs.

j = i.sort_values()
j

3     34.0
4     56.0
5    100.0
1      NaN
2      NaN
Name: col0, dtype: float64

Observe the index. All you need to do is use the index to reindex the dataframe. Either loc or reindex will do it.

df.loc[j.index]

        col0    col1  col2  col4
3         34  865665   296     0
4         56  865700   297     0
5        100  865628   292     5
1  '1ZE7999'  865545    20    20
2  'R022428'  865584   297     0
df.reindex(index=j.index)

        col0    col1  col2  col4
3         34  865665   296     0
4         56  865700   297     0
5        100  865628   292     5
1  '1ZE7999'  865545    20    20
2  'R022428'  865584   297     0

If you need to reset the index, that's easily done.

df.loc[j.index].reset_index(drop=True)

        col0    col1  col2  col4
0         34  865665   296     0
1         56  865700   297     0
2        100  865628   292     5
3  '1ZE7999'  865545    20    20
4  'R022428'  865584   297     0
Sign up to request clarification or add additional context in comments.

Comments

3

By using natsort

from natsort import natsorted

df.set_index('col0').reindex(natsorted(df.col0.tolist(), key=lambda y: y.lower())).reset_index()
Out[736]: 
        col0    col1  col2  col4
0         34  865665   296     0
1         56  865700   297     0
2        100  865628   292     5
3  '1ZE7999'  865545    20    20
4  'R022428'  865584   297     0

Comments

3

Use index_humansorted from natsort

import natsort
df = df.iloc[natsort.index_humansorted(df['col0'])]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.