1

I've list of indexes of images and it's length is 60000. I want to create another list which contains random pairs of indexes. The constraint here is each element of product set should contain distinct indexes. In other words I don't want to pair an index with it self.

Currently I've been using itertools.product method with for loop.

pairs = []
for pair in itertools.product(indexes, indexes):
    if pair[0]!=pair[1]:
        pairs.append(pair)

To problem it is taking a lot time and I couldn't use my computer because it gets stuck.

Is there better way of doing this?

1
  • If your inputs are all unique, you propose to create almost 3.6 billion elements. That's going to cause a serious problem with storage, never mind the time taken. Commented Jan 7 at 0:17

3 Answers 3

5

You can do it lazily without storing them:

pairs = filter(lambda x: x[0] != x[1], itertools.product(indexes, indexes))

Use itertools.ifilter if using python2

The idea of using itertools is that you dont need to precompute everything but ask to compute it one item (computation) at a time.

I made a time comparition sugested by @Deepak Saini, that is live here:

import numpy as np
import itertools

indexes = np.arange(1000)

def pairs(indexes):
  pairs = []
  for pair in itertools.product(indexes, indexes):
      if pair[0]!=pair[1]:
          pairs.append(pair)
  return pairs

def iter_pairs(indexes):
  return filter(lambda x: x[0] != x[1], itertools.product(indexes, indexes))

def iter_pairs_no_lambda(indexes):
  def comp(x):
    return x[0] != x[1]
  return filter(comp, itertools.product(indexes, indexes))

import time
for f in (pairs, iter_pairs, iter_pairs_no_lambda):
  print(f.__name__)
  t1 = time.time()
  res = f(indexes)
  print("Took {}".format(time.time() -  t1))

Which results on:

pairs
Took 1.012538194656372
iter_pairs
Took 0.04567384719848633
iter_pairs_no_lambda
Took 0.0002455711364746094
Sign up to request clarification or add additional context in comments.

5 Comments

Hi, when I try on indexes = np.arange(1000), doing a list() on the filter is slower than the OP's method. Can you explain that. Thanks.
The itertools.ifilter is what I needed. Thanks.
@DeepakSaini, probably is because of the use of the lambda or something related with numpy, try declaring a function and using that function instead of the lambda, times should improve.
@DeepakSaini,check the times in the updated answer ;)
The timing test included here is poor for a number of reasons. Of course it takes trivial time just to create the lazy-iteration setup; it hasn't done any actual iteration yet. Using a named function vs lambda should make no real difference because it's just storing a reference to a different thing inside the filter object. The observed timing result is because the timing is also done naively; the timeit standard library module exists for a reason.
1

Make sure you have unique values, and then choose a combination of two of them.

list(itertools.combinations(set(indexes), 2))

Comments

0
import numpy as np

a = np.asarray(indexes)
b = np.copy(a)

while np.any(a == b):
    b = np.random.choice(a, size=a.shape[0], replace=False)

And should be very fast

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.