2

I have a script to move items around and perform some basic functions on them. It relies on list.sort() to make sure the files are going to the right places.

For example I have 11 files:

A1_S1_ETC.ext
A2_S2_ETC.ext
...
...
A10_S10_ETC.ext
A11_S11_ETC.ext

The script asks for a path and output, from this I create two sorted lists using os and glob:

pathA = raw_input()
listA = list(glob.glob(os.path.join(path,'*.ext')))
listA.sort()
outp = raw_input()
outp.sort()
filen = [x.split(pathA)[1].split('_')[0] for x in listA]
filen.sort()
outp1 = [pathA + s + '/' for s in filen]
outp1.sort()

But when printed:

print listA
['A10_S10_ETC.ext', 'A11_S11_ETC.ext','A1_S1_ETC.ext',, A2_S2_ETC.ext']
print outp1
['/user/path/A1/', '/user/path/A10/', '/user/path/A11/', '/user/path/A2/']

I guess it's the '_SXX' part in the file name that's impacting the sort function? I don't care how it's sorted, as long as A1 files go into A1 directory - not just for this nomenclature but for any possible string.

Is there a way to do this - perhaps by asking the list.sort function to sort until the first underscore?

2
  • If you don't care about the actual order then just remove all the sort calls. All the lists will be ordered the same since each one is based on another using a list comprehension. Commented Dec 12, 2015 at 10:16
  • Unfortunately there's a listB that has to be used with listA and naturally it produces a different order, so there has to be consistent sorting across the board. Commented Dec 12, 2015 at 11:43

3 Answers 3

1

Sorting strings in python is a lexicographical sort. The strings are compared lexicographically. So 'A10' and 'A11' come before 'A1_'.

you can get your expect behaviour using:

lst.sort(key=lambda x: int(x.split('_')[0][1:])
Sign up to request clarification or add additional context in comments.

12 Comments

Is there a way around this?
Is that a python 3 expression? I'm working with 2.7 and after applying that sort my print commands return invalid syntax errors.
@J280694 it worked for me in python 2.7.10. If you can give us the traceback, maybe we can help you get it to work
Maybe I'm misunderstanding its use, I literally changed OP to outp1 = [pathA + s for s in filen] outp1.sort(key=lambda x: int(x.split('_')[0][1:]) After that point in my script all of the predefined lists no longer match up and nothing works after adding in the key=lambda x: int(x.split('_')[0][1:] File "<stdin>", line 2 print outp ^
@J280694 try this: outp1.sort(key=lambda x: int(x.split('/')[-1][1:]))
|
1

What happens is that sorting is lexicographic with ordering ASCII characters according to ASCII code. Here we have ASCII code for '0' is 48 while the ASCII code for '_' is 95 - which means that '0' < '_'.

What you can do do get consistency is to supply a consistent comparison function. For example:

def mycmp(s1, s2):
    s1 = s1.split(pathA)[1].split('_')[0]        
    s2 = s2.split(pathA)[1].split('_')[0]
    return cmp(s1, s2)

outp1.sort(cmp=mycmp)

Here the thing is that you use the same transformation before comparing the strings.

This relies on that since you strip away information you may strip away too much to make the elements distinct, but in your case it would mean that two elements of outp1 would become the same anyway so it wouldn't matter here.

Otherwise you would have to apply the sort before you transform the names. Which would mean not to sort filen or outp1 (because then their order would rely on the order of listA.

1 Comment

Ah okay, that makes sense. In my head the only way I can think to combat this would be to include the '_' in directory names and perhaps remove it afterwards - any better ideas?
1

What you want is called natural sort. See this thread about it: Does Python have a built in function for string natural sort?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.