How to sort this list of strings using a substring?

Question

I have a list whose elements are like the following:

Region_1.csv, Region_33.csv, Region_2.csv, Region_4.csv, Region_105.csv, ....

The list has all numbers ranging from 1-105 with none missing. I want to sort this list according to the region number such that it looks like this:

Region_1.csv, Region_2.csv, Region_3.csv, Region_4.csv, Region_105.csv etc.

Since the numbers have variable digits, I am struggling to sort this list.

Thanks.

and what you try?

user8060120
– user8060120

2018-07-12 09:22:30 +00:00
Commented Jul 12, 2018 at 9:22 — user8060120
– user8060120, Commented Jul 12, 2018 at 9:22

jpp · Accepted Answer · 2018-07-12 09:22:15Z

4

You can use sorted with a custom function, splitting first by . and then by _:

res = sorted(L, key=lambda x: int(x.split('.')[0].split('_')[-1]))

print(res)

['Region_1.csv', 'Region_2.csv', 'Region_4.csv', 'Region_33.csv', 'Region_105.csv']

answered Jul 12, 2018 at 9:22

jpp

166k37 gold badges301 silver badges363 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sunitha · Accepted Answer · 2018-07-12 09:24:09Z

2

lst.sort(key=lambda x: int(x.split('_')[1].split('.')[0]))
print(lst)

# ['Region_1.csv', 'Region_2.csv', 'Region_4.csv', 'Region_33.csv', 'Region_105.csv']

answered Jul 12, 2018 at 9:24

Sunitha

12.1k2 gold badges23 silver badges23 bronze badges

Comments

Andrej Kesely · Accepted Answer · 2018-07-12 09:24:53Z

2

Using re module, if you want to find in string something fancy:

l = ['Region_105.csv', 'Region_1.csv', 'Region_33.csv', 'Region_2.csv', 'Region_4.csv']

import re
print(sorted(l, key=lambda v: int(re.findall('\d+', v)[0])))

Output:

['Region_1.csv', 'Region_2.csv', 'Region_4.csv', 'Region_33.csv', 'Region_105.csv']

answered Jul 12, 2018 at 9:24

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Comments

Vasilis G. · Accepted Answer · 2018-07-12 09:30:07Z

1

You can also use the find method of strings:

inList = ['Region_1.csv', 'Region_33.csv', 'Region_2.csv', 'Region_4.csv', 'Region_105.csv']

outList = sorted(inList, key=lambda elem: int(elem[elem.find('_')+1:elem.find('.')]))

print(outList)

Output:

['Region_1.csv', 'Region_2.csv', 'Region_4.csv', 'Region_33.csv', 'Region_105.csv']

answered Jul 12, 2018 at 9:30

Vasilis G.

7,9074 gold badges23 silver badges32 bronze badges

Comments

RoadRunner · Accepted Answer · 2018-07-12 09:38:20Z

1

You could also try this:

>>> l = ['Region_105.csv', 'Region_1.csv', 'Region_33.csv', 'Region_2.csv', 'Region_4.csv']
>>> sorted(l, key=lambda x: int(''.join(filter(str.isdigit, x))))
['Region_1.csv', 'Region_2.csv', 'Region_4.csv', 'Region_33.csv', 'Region_105.csv']

answered Jul 12, 2018 at 9:38

RoadRunner

26.4k6 gold badges46 silver badges77 bronze badges

Comments

Prajna Bondel · Accepted Answer · 2018-07-12 09:29:51Z

0

You can extract the region number using regular expressions and create a dictionary of the form {region number: fileName} and then sort the dictionary based on the keys. Code for extracting region number and create dictionary:

import re
files=['Region_1.csv','Region_33.csv','Region_2.csv','Region_4.csv','Region_105.csv']
d=dict()
for f in files:
   rnum=re.find('[a-bA-B]_([0-9])\.csv$',f)
   d[rnum]=f

For sorting the items in dictionary, refer : How can I sort a dictionary by key?

answered Jul 12, 2018 at 9:29

Prajna Bondel

11 silver badge1 bronze badge

Collectives™ on Stack Overflow

How to sort this list of strings using a substring?

6 Answers 6

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related