2

I've got the following code:

cases = []

for file in files:

    # Get value from files and write to data
    data = [ id, b, c, d, e, f, g, h, i, j, k ]

    # Append the values to the data list
    cases.append(data)

# Sort the cases descending
cases.sort(reverse=True)

After running the for loop the cases list looks like this:

cases = [ ['id', val, val], ['id', val, val], ['id', val, val] ] etc.

id is a value like '600', '900', '1009', '1009a' or '1010' which I want to sort descending.

At the moment '1009a' is on top of the list while I want it to be between '1009' and '1010'. This is probably related to '1009a' being parsed as unicode while the other values are being parsed as long. A debugger also confirms this.

I've tried converting the id field to unicode using unicode(id) while writing the data list, but this does not give the desired result either. After sorting cases, output will start at '999', until reaching '600' and then will start at '1130' and run down to '1000'. Instead of starting at '1130' and running down to '600'. Which is what i want, with '1009a' being between '1009' and '1010'.

2
  • What are are the possible values for id? Some numerical digits with an optional alpha digit at the end? Commented Nov 29, 2017 at 9:30
  • You've cut the parsing out of your code, but it seems to me like that's the place you need to make changes to your code. You've sort of vaguely described part of it (where you say you're doing unicode(id)), but you haven't shown it in any sort of detail, so we can't really help you fix it. Commented Nov 29, 2017 at 9:37

3 Answers 3

4

If you are comparing strings containing numbers, those are sorted in alphabetic order, i.e. without regarding how many digits the number has. You have to convert those to int first, but that's tricky with the a/b suffix. You can use a regular expression to separate the number and the suffix:

>>> p = re.compile(r"(\d+)(.*)")
>>> def comp(x):
...     n, s = p.match(x).groups()
...     return int(n), s
...
>>> ids = ["1009", "1009a", "1009b", "1010", "99"]
>>> [comp(x) for x in ids]
[(1009, ''), (1009, 'a'), (1009, 'b'), (1010, ''), (99, '')]
>>>  sorted(ids, key=comp)                  
['99', '1009', '1009a', '1009b', '1010']

Applying this to your example, you probably need this (not tested):

cases.sort(key=lambda x: comp(x[0]), reverse=True)
Sign up to request clarification or add additional context in comments.

4 Comments

I got a TypeError: '<' not supported between instances of 'str' and 'NoneType' in cases where no character after the number was present. I'd suggest changing it to return int(n), s if s else '' in the comp function. Also, maybe an improvement maybe not, the regex can be added to the signature of the comp function, avoiding clutter in the global namespace, i.e. def comp(x, p=re.compile(r"(\d+)(.*)")):
Also, if you change the first line in comp to n, s = p.match(x[0]).groups() you can avoid the lambda stuff and just call cases.sort(key=comp, reverse=True)
@ArneRecknagel With that version of Python did you try this? For me, the letter-part is always '' in case the id has no letter, not None. Tested with Python 2.7 and 3.5. It's only None if I use r"(\d+)(.+)?"
My bad, you are right. I misplaced the * for the suffix match in my test.
0

Your problem is that when you are in unicode, you do have 9>1 and so 900>1000 as it compares from the first value.

What you need to do is write leading zeros for all your id fields so that 900 becomes 0900 and is now less than 1000. You can do this with this bit of code (although there are probably neater ways of doing it):

id = str(id).zfill(5)

Note that you don't need the str() bit if id is already a string. Here the zfill(5) will add zeros to the left of the string until the string is of length 5.

2 Comments

Neat idea, but this will zfill "90" to "00090" and "90b" to "0090b".
Damn you're right. I'm also reading that this function is actually deprecated and works only in Python 2.x, so probably not the best option. Edit: it's apparently still in Python3.6 and doesn't say anything about it being deprecated. But that still doens't make it work in this example.
0

Same principle as the one used be @Tobias_k but not quite as neat.

from itertools import takewhile, dropwhile

cases = [ ['600', 'foo1', 'bar1'], ['900', 'foo2', 'bar2'], ['1009', 'foo6', 'bar6'], ['1009a', 'foo3', 'bar3'], ['1010', 'foo4', 'bar4'] ]

def sorter_helper(str_):
  n = ''.join(takewhile(lambda x: x.isnumeric(), str_))
  s = ''.join(dropwhile(lambda x: x.isnumeric(), str_))
  return (int(n), s)

cases = sorted(cases, key=lambda x: sorter_helper(x[0]))
print(cases)  # -> [['600', 'foo1', 'bar1'], ['900', 'foo2', 'bar2'], ['1009', 'foo6', 'bar6'], ['1009a', 'foo3', 'bar3'], ['1010', 'foo4', 'bar4']]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.