pandas sort rows by index using a custom function

Question

I often need to work with dataframes where the rownames (index in pandas) are like of the type [a-z][0-9]+ (e.g. 'a20','a21',... or 'b1','b11','b2',...)

I would like to be able to sort the rows based on the number alone. From what I can gather of the API, df.sort_index() does not take a function argument where I can define the sorting function.

Is there a neat way of achieving this without resorting to removing the initial characters when creating the dataframe?

i think the current best approach is to construct a column from the index with the letter prefixes removed, and sort by that. — Haleemur Ali
– Haleemur Ali, Commented Feb 5, 2018 at 16:21
I feel like the best theoretical way to do this is to have a multi-index of one string column and one integer column, then you can use standard pandas sorting and features. However, one of the current answers might very well be a better practical answer since multi-indexes introduce more complexity. — JohnE
– JohnE, Commented Feb 5, 2018 at 16:48
I agree with @HaleemurAli, but only if the numeric part of the index is a unique identifier. I.e. how do you want to deal with A25 & B25 (if that is possible). — JohnE
– JohnE, Commented Feb 5, 2018 at 16:56

jezrael · Accepted Answer · 2018-02-06 10:16:26Z

4

You can use extract, convert to int and get position of sorted values by argsort, last use iloc:

(borrowing data from @Dark):

df = df.iloc[df.index.str.extract('(\d+)', expand=False).astype(int).argsort()]
print (df)
     0
a5   5
a6   6
a10  2
a11  3
b11  7
a12  1
b12  8
a15  4

Explanation:

df = pd.DataFrame([1,2,3,4,5,6,7,8],index=['a12','a10','a11','a15','a5','a6','b11','b12'])
print (df)
     0
a12  1
a10  2
a11  3
a15  4
a5   5
a6   6
b11  7
b12  8

First extract all integers:

print (df.index.str.extract('(\d+)', expand=False))
Index(['12', '10', '11', '15', '5', '6', '11', '12'], dtype='object')

Cast to integers:

print (df.index.str.extract('(\d+)', expand=False).astype(int))
Int64Index([12, 10, 11, 15, 5, 6, 11, 12], dtype='int64')

Get positions of sorted values by argsort:

print (df.index.str.extract('(\d+)', expand=False).astype(int).argsort())
[4 5 1 2 6 0 7 3]

Last select values by positions by iloc:

print (df.iloc[df.index.str.extract('(\d+)', expand=False).astype(int).argsort()])
     0
a5   5
a6   6
a10  2
a11  3
b11  7
a12  1
b12  8
a15  4

edited Feb 6, 2018 at 10:16

answered Feb 5, 2018 at 16:52

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

posdef Over a year ago

Could you elaborate a bit on what this does? It's cool that it seems to work but I'd like to be able to understand what it does :)

BENY · Accepted Answer · 2018-02-05 16:21:57Z

2

By using natsort sort by index list then reindex

df=pd.DataFrame({'v':[1,2,3]},index=['b1','b11','b2'])

from natsort import natsorted, ns


df.reindex(natsorted(df.index.tolist(), alg=ns.IGNORECASE))

Out[795]: 
     v
b1   1
b2   3
b11  2

answered Feb 5, 2018 at 16:21

BENY

324k22 gold badges176 silver badges250 bronze badges

2 Comments

Bharath M Shetty Over a year ago

You were a bit quicker

BENY Over a year ago

@Dark yep , just a little bit :-)

Bharath M Shetty · Accepted Answer · 2018-02-05 16:22:11Z

2

You can use sorted and reindex i.e

df = pd.DataFrame([1,2,3,4,5,6,7,8],index=['a12','a10','a11','a15','a5','a6','b11','b12'])

import re
df.reindex(sorted(df.index,key=lambda x: re.sub('[A-z]','',x)))


    0
a10  2
a11  3
b11  7
a12  1
b12  8
a15  4
a5   5
a6   6

answered Feb 5, 2018 at 16:22

Bharath M Shetty

30.6k6 gold badges65 silver badges111 bronze badges

1 Comment

jezrael Over a year ago

I think need convert to int :)

Collectives™ on Stack Overflow

pandas sort rows by index using a custom function

3 Answers 3

1 Comment

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related