I would like to sort a numerical list by the frequencies of the elements. (I found several ways to do it.)
During my exploration, I tried the below example.
Question : How does list.sort(key=list.count) works? Is it possible to use list.count() during list.sort()?
I read that the key-function is evaluated for each element of the list before the sort and those values are used for the comparisons during the sort.
Also, I read somewhere that during sort() the list is kind of locked. (sorry, I can't find the reference now - I've read quite a lot of blogs and tutorials on this topic in the last few hours, Python documentation and How-To Sort included)
This is the example
### Python 3.7 ###
data = [22, 11, 33, 99, 88, 77, 22, 44, 55, 44, 66, 22]
# sort by value
data.sort()
print(data)
>>> [11, 22, 22, 22, 33, 44, 44, 55, 66, 77, 88, 99]
# sort by frequency, i.e. list.count()
data.sort(key=data.count)
print(data)
>>> [11, 22, 22, 22, 33, 44, 44, 55, 66, 77, 88, 99]
# expected >>> [11, 33, 55, 66, 77, 88, 99, 44, 44, 22, 22, 22]
# but no change, the value-sorted list is printed
# or
data.sort(key=lambda e: data.count(e))
print(data)
>>> [11, 22, 22, 22, 33, 44, 44, 55, 66, 77, 88, 99]
# expected >>> [11, 33, 55, 66, 77, 88, 99, 44, 44, 22, 22, 22]
# but no change, the value-sorted list is printed
note: no error message.
As an addition, I would like to mention that the following works as expected
max(data, key=data.count)
And, of course, this also gives the expected result
print(sorted(data, key=data.count))
>>> [11, 33, 55, 66, 77, 88, 99, 44, 44, 22, 22, 22]
By the documentation sorted() and sort() should return the same result, don't they?
Thanks for your insights!
EDIT:
By the documentation - as I understood :
sort() takes the key-function and feeds the key-function with individual members of the list
-> the calculated results are the number of occurrences of each element (equivalent element results with equal calculated result, as their frequency is the same in the list)
: I'm not experienced to debug this deep in Python
: itself data.count() returns the appropriate list of the frequencies, that I checked
saves / caches the calculated results
: that's the foundation of its efficiency
uses the cached calculated results(!) to determine the order of the original list
-> the least frequent elements are at the front of the list, and the most frequent at he back
!!! this is not happening...
saves the list in its new order in-place
!!! ...OR this is not happening.
Additionally, as far as I understood (though not sure), somewhere during this process sort() 'locks away' the original list from other usage/access (and somewhere releases the lock - something about multi-threaded applications was in the explanation, as I recall).
IMPORTANT :
I'm not looking for a solution or code to sort the list - I'd appreciate an explanation of what's happening:
Why the result is the actual returned list and not my expectation?
In comparison, why sorted() works as expected?
list.countis an O(n) operation, so just calling[x.count() for x in some_list]is an O(N**2) operation. So it definitely impacts the efficiency of the algorithm. If you don't want to do that, do something likefrom collections import Counter; counts = Counter(some_list); some_list.sort(key=counts.get)