How to sort a pandas dataframe by a column that has both numbers and strings?

Question

I have a dataframe that looks like this

         col0         col1  col2   col4
         1    '1ZE7999'  865545   20    20
         2    'R022428'  865584  297     0
         3    34         865665  296     0 
         4    56         865700  297     0
         5    100        865628  292     5

I want to sort it by 'col0', first the numerical values, then the strings, the way that Excel sorts

       col0         col1  col2   col4
  3    34         865665  296     0 
  4    56         865700  297     0
  5    100        865628  292     5
  1    '1ZE7999'  865545   20    20
  2    'R022428'  865584  297     0

I used

df.sort_values(by='col1', ascending=True)

But that does not sort it that way, it sorts it from 0-9 then a-z

      col0         col1  col2   col4
 1    '1ZE7999'  865545   20    20
 5    100        865628  292     5
 3    34         865665  296     0 
 4    56         865700  297     0
 2    'R022428'  865584  297     0

Possible duplicate of how to sort descending an alphanumeric pandas index. — mrhallak
– mrhallak, Commented Dec 20, 2017 at 20:33
Hmm, do the strings need to be sorted as well? Or is it okay if they just come after the numbers? — cs95
– cs95, Commented Dec 20, 2017 at 21:32

cs95 · Accepted Answer · 2017-12-20 20:39:28Z

pd.to_numeric + sort_values + loc -

df.loc[pd.to_numeric(df.col0, errors='coerce').sort_values().index]

        col0    col1  col2  col4
3         34  865665   296     0
4         56  865700   297     0
5        100  865628   292     5
1  '1ZE7999'  865545    20    20
2  'R022428'  865584   297     0

Details

pd.to_numeric coerces non-integral values to NaN -

i = pd.to_numeric(df.col0, errors='coerce')
i

1      NaN
2      NaN
3     34.0
4     56.0
5    100.0
Name: col0, dtype: float64

sort_values sorts the column, ignoring NaNs.

j = i.sort_values()
j

3     34.0
4     56.0
5    100.0
1      NaN
2      NaN
Name: col0, dtype: float64

Observe the index. All you need to do is use the index to reindex the dataframe. Either loc or reindex will do it.

df.loc[j.index]

        col0    col1  col2  col4
3         34  865665   296     0
4         56  865700   297     0
5        100  865628   292     5
1  '1ZE7999'  865545    20    20
2  'R022428'  865584   297     0

df.reindex(index=j.index)

        col0    col1  col2  col4
3         34  865665   296     0
4         56  865700   297     0
5        100  865628   292     5
1  '1ZE7999'  865545    20    20
2  'R022428'  865584   297     0

If you need to reset the index, that's easily done.

df.loc[j.index].reset_index(drop=True)

        col0    col1  col2  col4
0         34  865665   296     0
1         56  865700   297     0
2        100  865628   292     5
3  '1ZE7999'  865545    20    20
4  'R022428'  865584   297     0

BENY · Accepted Answer · 2017-12-20 20:33:49Z

3

By using natsort

from natsort import natsorted

df.set_index('col0').reindex(natsorted(df.col0.tolist(), key=lambda y: y.lower())).reset_index()
Out[736]: 
        col0    col1  col2  col4
0         34  865665   296     0
1         56  865700   297     0
2        100  865628   292     5
3  '1ZE7999'  865545    20    20
4  'R022428'  865584   297     0

answered Dec 20, 2017 at 20:33

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

shyam sundar · Accepted Answer · 2021-10-28 10:47:31Z

3

Use index_humansorted from natsort

import natsort
df = df.iloc[natsort.index_humansorted(df['col0'])]

answered Oct 28, 2021 at 10:47

shyam sundar

311 bronze badge

Collectives™ on Stack Overflow

How to sort a pandas dataframe by a column that has both numbers and strings?

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related