Sorting in a list containing lists

Question

I've got the following code:

cases = []

for file in files:

    # Get value from files and write to data
    data = [ id, b, c, d, e, f, g, h, i, j, k ]

    # Append the values to the data list
    cases.append(data)

# Sort the cases descending
cases.sort(reverse=True)

After running the for loop the cases list looks like this:

cases = [ ['id', val, val], ['id', val, val], ['id', val, val] ] etc.

id is a value like '600', '900', '1009', '1009a' or '1010' which I want to sort descending.

At the moment '1009a' is on top of the list while I want it to be between '1009' and '1010'. This is probably related to '1009a' being parsed as unicode while the other values are being parsed as long. A debugger also confirms this.

I've tried converting the id field to unicode using unicode(id) while writing the data list, but this does not give the desired result either. After sorting cases, output will start at '999', until reaching '600' and then will start at '1130' and run down to '1000'. Instead of starting at '1130' and running down to '600'. Which is what i want, with '1009a' being between '1009' and '1010'.

What are are the possible values for id? Some numerical digits with an optional alpha digit at the end? — Galen
– Galen, Commented Nov 29, 2017 at 9:30
You've cut the parsing out of your code, but it seems to me like that's the place you need to make changes to your code. You've sort of vaguely described part of it (where you say you're doing unicode(id)), but you haven't shown it in any sort of detail, so we can't really help you fix it. — Blckknght
– Blckknght, Commented Nov 29, 2017 at 9:37

tobias_k · Accepted Answer · 2017-11-29 09:30:46Z

4

If you are comparing strings containing numbers, those are sorted in alphabetic order, i.e. without regarding how many digits the number has. You have to convert those to int first, but that's tricky with the a/b suffix. You can use a regular expression to separate the number and the suffix:

>>> p = re.compile(r"(\d+)(.*)")
>>> def comp(x):
...     n, s = p.match(x).groups()
...     return int(n), s
...
>>> ids = ["1009", "1009a", "1009b", "1010", "99"]
>>> [comp(x) for x in ids]
[(1009, ''), (1009, 'a'), (1009, 'b'), (1010, ''), (99, '')]
>>>  sorted(ids, key=comp)                  
['99', '1009', '1009a', '1009b', '1010']

Applying this to your example, you probably need this (not tested):

cases.sort(key=lambda x: comp(x[0]), reverse=True)

answered Nov 29, 2017 at 9:30

tobias_k

83.1k12 gold badges130 silver badges186 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Arne Over a year ago

I got a TypeError: '<' not supported between instances of 'str' and 'NoneType' in cases where no character after the number was present. I'd suggest changing it to return int(n), s if s else '' in the comp function. Also, maybe an improvement maybe not, the regex can be added to the signature of the comp function, avoiding clutter in the global namespace, i.e. def comp(x, p=re.compile(r"(\d+)(.*)")):

Arne Over a year ago

Also, if you change the first line in comp to n, s = p.match(x[0]).groups() you can avoid the lambda stuff and just call cases.sort(key=comp, reverse=True)

tobias_k Over a year ago

@ArneRecknagel With that version of Python did you try this? For me, the letter-part is always '' in case the id has no letter, not None. Tested with Python 2.7 and 3.5. It's only None if I use r"(\d+)(.+)?"

Arne Over a year ago

My bad, you are right. I misplaced the * for the suffix match in my test.

Arthur Spoon · Accepted Answer · 2017-11-29 09:34:45Z

0

Your problem is that when you are in unicode, you do have 9>1 and so 900>1000 as it compares from the first value.

What you need to do is write leading zeros for all your id fields so that 900 becomes 0900 and is now less than 1000. You can do this with this bit of code (although there are probably neater ways of doing it):

id = str(id).zfill(5)

Note that you don't need the str() bit if id is already a string. Here the zfill(5) will add zeros to the left of the string until the string is of length 5.

edited Nov 29, 2017 at 9:34

answered Nov 29, 2017 at 9:32

Arthur Spoon

4626 silver badges18 bronze badges

2 Comments

tobias_k Over a year ago

Neat idea, but this will zfill "90" to "00090" and "90b" to "0090b".

Arthur Spoon Over a year ago

Damn you're right. I'm also reading that this function is actually deprecated and works only in Python 2.x, so probably not the best option. Edit: it's apparently still in Python3.6 and doesn't say anything about it being deprecated. But that still doens't make it work in this example.

Ma0 · Accepted Answer · 2017-11-29 09:36:12Z

0

Same principle as the one used be @Tobias_k but not quite as neat.

from itertools import takewhile, dropwhile

cases = [ ['600', 'foo1', 'bar1'], ['900', 'foo2', 'bar2'], ['1009', 'foo6', 'bar6'], ['1009a', 'foo3', 'bar3'], ['1010', 'foo4', 'bar4'] ]

def sorter_helper(str_):
  n = ''.join(takewhile(lambda x: x.isnumeric(), str_))
  s = ''.join(dropwhile(lambda x: x.isnumeric(), str_))
  return (int(n), s)

cases = sorted(cases, key=lambda x: sorter_helper(x[0]))
print(cases)  # -> [['600', 'foo1', 'bar1'], ['900', 'foo2', 'bar2'], ['1009', 'foo6', 'bar6'], ['1009a', 'foo3', 'bar3'], ['1010', 'foo4', 'bar4']]

answered Nov 29, 2017 at 9:36

Ma0

15.2k4 gold badges38 silver badges70 bronze badges

Collectives™ on Stack Overflow

Sorting in a list containing lists

3 Answers 3

4 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related