Input
import numpy as np
import itertools
a = np.array([ 1, 6, 7, 8, 10, 11, 13, 14, 15, 19, 20, 23, 24, 26, 28, 29, 33,
34, 41, 42, 43, 44, 45, 46, 47, 52, 54, 58, 60, 61, 65, 70, 75]).astype(np.uint8)
b = np.array([ 2, 3, 4, 10, 12, 14, 16, 20, 22, 26, 28, 29, 30, 31, 34, 36, 37,
38, 39, 40, 41, 46, 48, 49, 50, 52, 53, 55, 56, 57, 59, 60, 63, 66,
67, 68, 69, 70, 71, 74]).astype(np.uint8)
c = np.array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75]).astype(np.uint8)
I would like to get the Cartesian product of the 3 arrays but I do not want any duplicate elements in one row [1, 2, 1] would not be valid and only one of these two would be valid [10, 14, 0] or [14, 10, 0] since 10 and 14 are both in a and b.
Python only
def no_numpy():
combos = {tuple(set(i)): i for i in itertools.product(a, b, c)}
combos = [val for key, val in combos.items() if len(key) == 3]
%timeit no_numpy() # 32.5 ms ± 508 µs per loop
Numpy
# Solution from (https://stackoverflow.com/a/11146645/18158000)
def cartesian_product(*arrays):
broadcastable = np.ix_(*arrays)
broadcasted = np.broadcast_arrays(*broadcastable)
rows, cols = np.prod(broadcasted[0].shape), len(broadcasted)
dtype = np.result_type(*arrays)
out = np.empty(rows * cols, dtype=dtype)
start, end = 0, rows
for a in broadcasted:
out[start:end] = a.reshape(-1)
start, end = end, end + rows
return out.reshape(cols, rows).T
def numpy():
combos = {tuple(set(i)): i for i in cartesian_product(*[a, b, c])}
combos = [val for key, val in combos.items() if len(key) == 3]
%timeit numpy() # 96.2 ms ± 136 µs per loop
My guess is in the numpy version converting the np.array to a set is why it is much slower but when comparing strictly getting the initial products cartesian_product is much faster than itertools.product.
Can the numpy version be modified in anyway to outperform the pure python solution or is there another solution that outperforms both?
listoftuples, the numpy version returns alistofnumpy.ndarray@TimRobertstuple(set(i))is not a safe key for your use case. There is no guarantee that equal sets will convert to equal tuples. Also, you want to exclude rows like[1, 2, 1], but nothing in your code does that.numpy()to do the filtering.if len(key) == 3in both examples @JérômeRichard the lengths should both equal59825nowtuple(sorted(set(i)))is sufficient to solve the problem pointed out by @user2357112supportsMonica. It is more compliant with the definition of your problem. Note the number of item significantly decrease with that (it goes down to50014)