1

Using cProfile:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   17.834   17.834 <string>:1(<module>)
        1    0.007    0.007   17.834   17.834 basher.py:5551(_refresh)
        1    0.000    0.000   10.522   10.522 basher.py:1826(RefreshUI)
        4    0.024    0.006   10.517    2.629 basher.py:961(PopulateItems)
      211    1.494    0.007    7.488    0.035 basher.py:1849(PopulateItem)
      231    0.074    0.000    6.734    0.029 {method 'sort' of 'list' objects}
      215    0.002    0.000    6.688    0.031 bosh.py:4764(getOrdered)
     1910    3.039    0.002    6.648    0.003 bosh.py:4770(<lambda>)
      253    0.178    0.001    5.600    0.022 bosh.py:3325(getStatus)
        1    0.000    0.000    5.508    5.508 bosh.py:4327(refresh)
     1911    3.051    0.002    3.330    0.002 {method 'index' of 'list' objects}

The 1910 3.039 0.002 6.648 0.003 bosh.py:4770(<lambda>) line puzzles me. At bosh.py:4770 I have modNames.sort(key=lambda a: (a in data) and data.index(a)), data and modNames being lists. Notice 1911 3.051 0.002 3.330 0.002 {method 'index' of 'list' objects} which seems to come from this line.

So why is this so slow ? Any way I can rewrite this sort() so it performs faster ?

EDIT: a final ingredient I was missing to grok this lambda:

>>> True and 3
3
1
  • 1
    The function body is a in data and data.index(a) and the two operations in it are O(n) where n is the list size of data. Without answering your entire question, I would place the blame on this lambda function. Each time it is called, the lambda will scan through data. If you use a dict for data instead of a list, it'll probably be faster since dict has an average lookup time complexity of O(1). Commented Oct 30, 2014 at 2:48

1 Answer 1

4

As YardGlassOfCode stated, it's not the lambda per se which is slow, it is the O(n) operation inside the lambda which is slow. Both a in data and data.index(a) are O(n) operations, where n is the length of data. And as an additional affront to efficiency, the call to index repeats much of the work done in a in data too. If the items in data are hashable, then you can speed this up considerably by first preparing a dict:

weight = dict(zip(data, range(len(data))))
modNames.sort(key=weight.get)  # Python2, or
modNames.sort(key=lambda a: weight.get(a, -1))  # works in Python3

This is much quicker because each dict lookup is a O(1) operation.

Note that modNames.sort(key=weight.get) relies on None comparing as less than integers:

In [39]: None < 0
Out[39]: True

In Python3, None < 0 raises an TypeError. So lambda a: weight.get(a, -1) is used to return -1 when a is not in weight.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks - will test it and post back. So I guess also that the original worked because False < 0 also (sort(key=...) could use some docs)
Since False == 0, False < 0 is False, and 0 < False is False. That means in the original, the 0 indexed item could get sandwiched in between items not in data. Consider, for example, sorted([False, 0, False]).
NB: I used key=lambda a: dataDict.get(a, sys.maxint) as I wanted items that are absent to be ordered to the bottom python 2.7.8

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.