Why is this lambda involving a list.index() call so slow?

Question

Using cProfile:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   17.834   17.834 <string>:1(<module>)
        1    0.007    0.007   17.834   17.834 basher.py:5551(_refresh)
        1    0.000    0.000   10.522   10.522 basher.py:1826(RefreshUI)
        4    0.024    0.006   10.517    2.629 basher.py:961(PopulateItems)
      211    1.494    0.007    7.488    0.035 basher.py:1849(PopulateItem)
      231    0.074    0.000    6.734    0.029 {method 'sort' of 'list' objects}
      215    0.002    0.000    6.688    0.031 bosh.py:4764(getOrdered)
     1910    3.039    0.002    6.648    0.003 bosh.py:4770(<lambda>)
      253    0.178    0.001    5.600    0.022 bosh.py:3325(getStatus)
        1    0.000    0.000    5.508    5.508 bosh.py:4327(refresh)
     1911    3.051    0.002    3.330    0.002 {method 'index' of 'list' objects}

The 1910 3.039 0.002 6.648 0.003 bosh.py:4770(<lambda>) line puzzles me. At bosh.py:4770 I have modNames.sort(key=lambda a: (a in data) and data.index(a)), data and modNames being lists. Notice 1911 3.051 0.002 3.330 0.002 {method 'index' of 'list' objects} which seems to come from this line.

So why is this so slow ? Any way I can rewrite this sort() so it performs faster ?

EDIT: a final ingredient I was missing to grok this lambda:

>>> True and 3
3

The function body is a in data and data.index(a) and the two operations in it are O(n) where n is the list size of data. Without answering your entire question, I would place the blame on this lambda function. Each time it is called, the lambda will scan through data. If you use a dict for data instead of a list, it'll probably be faster since dict has an average lookup time complexity of O(1). — YardGlassOfCode
– YardGlassOfCode, Commented Oct 30, 2014 at 2:48

unutbu · Accepted Answer · 2014-10-30 11:17:55Z

4

As YardGlassOfCode stated, it's not the lambda per se which is slow, it is the O(n) operation inside the lambda which is slow. Both a in data and data.index(a) are O(n) operations, where n is the length of data. And as an additional affront to efficiency, the call to index repeats much of the work done in a in data too. If the items in data are hashable, then you can speed this up considerably by first preparing a dict:

weight = dict(zip(data, range(len(data))))
modNames.sort(key=weight.get)  # Python2, or
modNames.sort(key=lambda a: weight.get(a, -1))  # works in Python3

This is much quicker because each dict lookup is a O(1) operation.

Note that modNames.sort(key=weight.get) relies on None comparing as less than integers:

In [39]: None < 0
Out[39]: True

In Python3, None < 0 raises an TypeError. So lambda a: weight.get(a, -1) is used to return -1 when a is not in weight.

edited Oct 30, 2014 at 11:17

answered Oct 30, 2014 at 2:50

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mr_and_Mrs_D Over a year ago

Thanks - will test it and post back. So I guess also that the original worked because False < 0 also (sort(key=...) could use some docs)

unutbu Over a year ago

Since False == 0, False < 0 is False, and 0 < False is False. That means in the original, the 0 indexed item could get sandwiched in between items not in data. Consider, for example, sorted([False, 0, False]).

Mr_and_Mrs_D Over a year ago

NB: I used key=lambda a: dataDict.get(a, sys.maxint) as I wanted items that are absent to be ordered to the bottom python 2.7.8

Collectives™ on Stack Overflow

Why is this lambda involving a list.index() call so slow?

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related