4

How to sort an array in python firstly by the length of the words (longest to shortest), and then alphabetically?

Here is what I mean:

I have this list: WordsArray = ["Lorem", "ipsum", "dolor", "sit", "amet", "consectetur", "adipiscing", "elit", "sed", "do", "eiusmod", "tempor", "incididunt"]

I want to output this: ['consectetur', 'adipiscing', 'incididunt', 'eiusmod', 'tempor', 'dolor', 'ipsum', 'Lorem', 'amet', 'elit', 'sed', 'sit', 'do']

I can already sort alphabetically using print (sorted(WordsArray)):

['Lorem', 'adipiscing', 'amet', 'consectetur', 'do', 'dolor', 'eiusmod', 'elit', 'incididunt', 'ipsum', 'sed', 'sit', 'tempor']

3 Answers 3

6

Firstly, using just sorted will not sort alphabetically, look at your output... I am pretty sure L does not come before a. What you are currently doing is a case-sensitive sort.

You can perform a case-insensitive sort by using a Key Function like so:

>>> words_list = ["Lorem", "ipsum", "dolor", "sit", "amet", "consectetur", "adipiscing", "elit", "sed", "do", "eiusmod", "tempor", "incididunt"]
>>> sorted(words_list, key=str.lower)
['adipiscing', 'amet', 'consectetur', 'do', 'dolor', 'eiusmod', 'elit', 'incididunt', 'ipsum', 'Lorem', 'sed', 'sit', 'tempor']

You can then modify the Key Function like below to sort first on length then alphabetically:

>>> def custom_key(str):
...   return -len(str), str.lower()
... 
>>> sorted(words_list, key=custom_key)
['consectetur', 'adipiscing', 'incididunt', 'eiusmod', 'tempor', 'dolor', 'ipsum', 'Lorem', 'amet', 'elit', 'sed', 'sit', 'do']
Sign up to request clarification or add additional context in comments.

5 Comments

Solution works, but have a doubt. With lower, the order comes in : ..'dolor', 'ipsum'.. , but without it, the order is ..'ipsum', 'dolor'.. How can lower make a difference there?
@KaushikNP Sorry I don't understand? if you look at OP's code dolor is before ipsum and it is also before in both of my examples?
As i said, Your solution works. Trying some variations on my own, I noticed that behaviour.
>>> sorted(words_list, key=lambda x: (-len(x))) gives => ['consectetur', 'adipiscing', 'incididunt', 'eiusmod', 'tempor', 'Lorem', 'ipsum', 'dolor', 'amet', 'elit', 'sit', 'sed', 'do']
@KaushikNP You mean when you don't sort alphabetically, it doesn't get sorted alphabetically? Shocking.
5

You can use as key a tuple that specifies first the negative length of the string -len(x) and then x itself:

sorted(WordsArray, key=lambda x: (-len(x),x))

Since tuples are sorted by the first element and in case of ex aequo by the second element and so on, we thus first compare on the -len(x) of the two strings, so that means that the larger string is sorted first.

In case both strings have the same length, we compare on x, so alphabetically.

Mind that sorting two strings is case sensitive: Python sorts them lexicographically, but where the order is specified by the ord(..) of the first characters, etc. If you want to order alphabetically, you better convert upper case and lower case to the same case. A fast way to handle this is:

sorted(WordsArray, key=lambda x: (-len(x),x.lower()))

But this is not always correct: since for instance the est-zet in German is sometimes translate to ss, etc. In fact sorting alphabetically is a very hard problem in some languages. So in that case, you need to specify collation.

5 Comments

Solution works, but have a doubt. With lower, the order comes in : ..'dolor', 'ipsum'.. , but without it, the order is ..'ipsum', 'dolor'.. How can lower make a difference there?
@KaushikNP: because not all characters have a lower in all cultures/languages. It is already a hard problem whether two strings are equivalent. In German for instance 'Foostraße' en 'Foostrasse' are frequently seen as the same text. See here for instance.
what you say are correct. I don't think you understand my problem though. >>> sorted(words_list, key=lambda x: (-len(x))) gives => ['consectetur', 'adipiscing', 'incididunt', 'eiusmod', 'tempor', 'Lorem', 'ipsum', 'dolor', 'amet', 'elit', 'sit', 'sed', 'do'] . Order should not be so for ipsum and dolor though.
@KaushikNP: yes, but that's why we map x to a 2-tuple: (-len(x),x.lower()) (so in case the two -len(x)s are equal, Python will perform a comparison on the second element of the tuple x.lower().
Oh, ok. Otherwise there is no comparision? Hmmm. Got it
0

For who in my case:

A = [a_12,a_3,a_11]

sorted(A, key=lambda x: (len(x),x))

[a_3, a_11, a_12]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.