Fastest way for working with itertools.combinations

Question

I need to speed up the function below :

import numpy as np
import itertools
import timeit

def combcol(myarr):
    ndims = myarr.shape[0]
    solutions = []
    for idx1, idx2, idx3, idx4, idx5, idx6 in itertools.combinations(np.arange(ndims), 6):
        c1, c2, c3, c4, c5, c6 = myarr[idx1,1], myarr[idx2,2], myarr[idx3,1], myarr[idx4,2], myarr[idx5,1], myarr[idx6,2]
        if c1-c2>0 and c2-c3<0 and c3-c4>0 and c4-c5<0 and c5-c6>0 :
            solutions.append(((idx1, idx2, idx3, idx4, idx5, idx6),(c1, c2, c3, c4, c5, c6)))
    return solutions

X = np.random.random((20, 10))  
Y = np.random.random((40, 10))  


if __name__=='__main__':
    from timeit import Timer
    t = Timer(lambda : combcol(X))
    t1 = Timer(lambda : combcol(Y))
    print('t : ',t.timeit(number=1),'t1 : ',t1.timeit(number=1))

Results :

t :  0.6165180211451455 t1 :  64.49216925614847

The algorithm is too slow for my standard use (myarr.shape[0] = 500). Is there a NumPy way to decrease execution time of this function (without wasting too much memory)?
Is it possible to implement the problem in Cython?

I've tried using cProfile to see which parts are slow. Most of the time here is spent calling combcol().

import profile
........
........
profile.run('print(len(combcol(Y))); print')

144547
         144559 function calls in 39.672 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   144547    0.641    0.000    0.641    0.000 :0(append)
        1    0.000    0.000    0.000    0.000 :0(arange)
        2    0.000    0.000    0.000    0.000 :0(charmap_encode)
        1    0.000    0.000   39.672   39.672 :0(exec)
        1    0.000    0.000    0.000    0.000 :0(len)
        1    0.000    0.000    0.000    0.000 :0(print)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        1    0.094    0.094   39.672   39.672 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 cp850.py:18(encode)
        1   38.938   38.938   39.578   39.578 essaiNumpy4.py:13(combcol)
        1    0.000    0.000   39.672   39.672 profile:0(print(len(combcol(Y))); print)
        0    0.000             0.000          profile:0(profiler)

Finally I have modified the code like this :

def combcol2(myarr):
    ndims = myarr.shape[0]
    myarr1 = myarr[:,1].tolist()
    myarr2 = myarr[:,2].tolist()
    solutions = []
    for idx1, idx2, idx3, idx4, idx5, idx6 in itertools.combinations(range(ndims), 6):
        if myarr1[idx1] > myarr2[idx2] < myarr1[idx3] > myarr2[idx4] < myarr1[idx5] > myarr2[idx6]:
            solutions.append(((idx1, idx2, idx3, idx4, idx5, idx6),(myarr1[idx1], myarr2[idx2], myarr1[idx3], myarr2[idx4], myarr1[idx5], myarr2[idx6])))
    return solutions

X = np.random.random((40, 10))

if __name__=='__main__':
    from timeit import Timer
    t = Timer(lambda : combcol2(X))
    print('t : ',t.timeit(number=1))

Result :

t :  4.341582240200919

Have you tried using cProfile to see which parts are slow? Ultimately that is a huge number of combinations, which is going to limit how much you can speed it up. — BrenBarn
– BrenBarn, Commented Dec 29, 2013 at 19:47
If you're willing to reimplement itertools.combinations that looks easily rewritable in Cython. — Veedrac
– Veedrac, Commented Dec 29, 2013 at 19:52
itertools.combinations is already written in C. The real problem here is that 500-choose-6 is over 21 trillion. That's hopeless :-) — Tim Peters
– Tim Peters, Commented Dec 29, 2013 at 19:54
We would probably be able to help more if you told us what you were trying to achieve with all the combinations. — cyborg
– cyborg, Commented Dec 29, 2013 at 20:41

James Waldby - jwpat7 · Accepted Answer · 2013-12-29 21:11:00Z

alko has listed a useful overhaul for your program, and Tim Peters has pointed out that 500-choose-6 is over 21 trillion (ie 21057686727000). This answer will point out a simple speedup to your original program. (I think it's a small speedup compared to what alko's method might offer, but the following is worth noting for future python programming.)

Your selection statement is
if c1-c2>0 and c2-c3<0 and c3-c4>0 and c4-c5<0 and c5-c6>0 :
which is equivalent to
if c1>c2 and c2<c3 and c3>c4 and c4<c5 and c5>c6 :
but in the python interpreter the former takes 67% longer than the latter. For example, here are some sample outputs for the two cases, on my Intel i3-2120 machine (apparently several times faster than your machine) running Python 2.7.5+ :
('t : ', 0.12977099418640137, 't1 : ', 14.45378589630127)
('t : ', 0.0887291431427002, 't1 : ', 8.54729700088501)
The ratio of averages over 4 such runs was 14.529/8.709 = 1.668.

alko · Accepted Answer · 2013-12-29 20:42:33Z

2

I don't know what exactly you're trying to get, but from your code, constraint for your solutions, idx1 ... idx6 are of form

X[idx1, 1] > X[idx2, 2]
X[idx3, 1] > X[idx2, 2]

X[idx3, 1] > X[idx4, 2]
X[idx5, 1] > X[idx4, 2]

X[idx5, 1] > X[idx6, 2]

You can get (almost instantly for shape <= 500) all admisible pairs (idx1, idx2), (idx3, idx2), (idx3, idx4), (idx5, idx4), idx(5, idx6) of this form with

idx56 = lst = zip(*np.where((X[:,1].reshape(-1,1)>X[:,2].reshape(1,-1))))

Then you can generate all possible indices with:

import operator 
slst = sorted(lst, key=operator.itemgetter(1))
grps = [[x, list(g)] for x, g in itertools.groupby(slst, key=operator.itemgetter(1))]
idx345 = idx123 = [[x1, x2, x3] for _, g in grps for ((x1, x2), (x3, _)) in itertools.combinations(g, 2)]

It took some seconds to compute those lists on my test machine (idx345 was more than 20 billion items long). You just have to join those lists on x3=x3, x5=x5 (I bet solution size will significantly grow with this join, so can't guess it's usefulness).

edited Dec 29, 2013 at 20:42

answered Dec 29, 2013 at 20:32

alko

48.7k12 gold badges99 silver badges105 bronze badges

1 Comment

user3144427 Over a year ago

In python 3 operator.itemgetter appears to not work.

cyborg · Accepted Answer · 2013-12-29 21:19:34Z

2

You could iterate over combinations with a recursion, not using itetools.combinations(). The benefit of that would be that you could test, e.g., whether c1>c2, and if False, you could skip all related combinations.

answered Dec 29, 2013 at 21:19

cyborg

10.2k4 gold badges40 silver badges56 bronze badges

2 Comments

user3144427 Over a year ago

How can i do with a resurvive fonction like this : def combinations(*seqs): if not seqs: return (item for item in ()) first, rest = seqs[0], seqs[1:] if not rest: return ((item,) for item in first) return ((item,) + items for item in first for items in combinations(*rest))

cyborg Over a year ago

I think that the answer in codereview.stackexchange.com/a/38299 is good enough, if you always have 6 items.

Collectives™ on Stack Overflow

Fastest way for working with itertools.combinations

3 Answers 3

Comments

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related