fastest way to create cartesian product without duplicate elements inside pair

Question

I've list of indexes of images and it's length is 60000. I want to create another list which contains random pairs of indexes. The constraint here is each element of product set should contain distinct indexes. In other words I don't want to pair an index with it self.

Currently I've been using itertools.product method with for loop.

pairs = []
for pair in itertools.product(indexes, indexes):
    if pair[0]!=pair[1]:
        pairs.append(pair)

To problem it is taking a lot time and I couldn't use my computer because it gets stuck.

Is there better way of doing this?

If your inputs are all unique, you propose to create almost 3.6 billion elements. That's going to cause a serious problem with storage, never mind the time taken. — Karl Knechtel
– Karl Knechtel, Commented Jan 7 at 0:17

Netwave · Accepted Answer · 2018-11-28 11:52:34Z

5

You can do it lazily without storing them:

pairs = filter(lambda x: x[0] != x[1], itertools.product(indexes, indexes))

Use itertools.ifilter if using python2

The idea of using itertools is that you dont need to precompute everything but ask to compute it one item (computation) at a time.

I made a time comparition sugested by @Deepak Saini, that is live here:

import numpy as np
import itertools

indexes = np.arange(1000)

def pairs(indexes):
  pairs = []
  for pair in itertools.product(indexes, indexes):
      if pair[0]!=pair[1]:
          pairs.append(pair)
  return pairs

def iter_pairs(indexes):
  return filter(lambda x: x[0] != x[1], itertools.product(indexes, indexes))

def iter_pairs_no_lambda(indexes):
  def comp(x):
    return x[0] != x[1]
  return filter(comp, itertools.product(indexes, indexes))

import time
for f in (pairs, iter_pairs, iter_pairs_no_lambda):
  print(f.__name__)
  t1 = time.time()
  res = f(indexes)
  print("Took {}".format(time.time() -  t1))

Which results on:

pairs
Took 1.012538194656372
iter_pairs
Took 0.04567384719848633
iter_pairs_no_lambda
Took 0.0002455711364746094

edited Nov 28, 2018 at 11:52

answered Nov 28, 2018 at 11:20

Netwave

43.2k6 gold badges69 silver badges111 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Deepak Saini Over a year ago

Hi, when I try on indexes = np.arange(1000), doing a list() on the filter is slower than the OP's method. Can you explain that. Thanks.

Mitiku Over a year ago

The itertools.ifilter is what I needed. Thanks.

Netwave Over a year ago

@DeepakSaini, probably is because of the use of the lambda or something related with numpy, try declaring a function and using that function instead of the lambda, times should improve.

Netwave Over a year ago

@DeepakSaini,check the times in the updated answer ;)

Karl Knechtel Jan 7 at 0:20

The timing test included here is poor for a number of reasons. Of course it takes trivial time just to create the lazy-iteration setup; it hasn't done any actual iteration yet. Using a named function vs lambda should make no real difference because it's just storing a reference to a different thing inside the filter object. The observed timing result is because the timing is also done naively; the timeit standard library module exists for a reason.

Karl Knechtel · Accepted Answer · 2022-06-25 02:55:51Z

1

Make sure you have unique values, and then choose a combination of two of them.

list(itertools.combinations(set(indexes), 2))

answered Jun 25, 2022 at 2:55

Karl Knechtel

61.4k14 gold badges132 silver badges193 bronze badges

Comments

Andrew · Accepted Answer · 2018-11-28 11:34:03Z

0

import numpy as np

a = np.asarray(indexes)
b = np.copy(a)

while np.any(a == b):
    b = np.random.choice(a, size=a.shape[0], replace=False)

And should be very fast

answered Nov 28, 2018 at 11:34

Andrew

1,0821 gold badge7 silver badges15 bronze badges

Collectives™ on Stack Overflow

fastest way to create cartesian product without duplicate elements inside pair

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related