1

I have 5 numpy arrays:

array_1 = [1,2,3]
array_2 = [4,5,6]
array_3 = [7,8,9]
array_4 = [10,11,12]
array_5 = [1,2,3]

I need to compare them all - essentially, if ANY of the 5 arrays above have the same values (and index), I need to know about it. Currently, I have something like this done:

index_array_1 = np.where(array_1 == array_2)[0]
index_array_2 = np.where(array_1 == array_3)[0]
index_array_3 = np.where(array_1 == array_4)[0]
index_array_4 = np.where(array_1 == array_5)[0]
index_array_5 = np.where(array_2 == array_3)[0]
index_array_6 = np.where(array_2 == array_4)[0]
index_array_7 = np.where(array_2 == array_5)[0]
index_array_8 = np.where(array_3 == array_4)[0]
index_array_9 = np.where(array_3 == array_5)[0]
index_array_10 = np.where(array_4 == array_5)[0]

So, in this case, only index_array_4 would return any values, because array_1 and array_5 match up. But, this clearly isn't the best way to do this. It's a lot of code, and it takes a while to run as well.

Is there something I haven't come across yet where I can essentially say "if ANY of the 5 arrays match, tell me, and also let me know which two arrays are the ones that match"?

I'd also like it to return an index array of one of the matching arrays, as well.

3 Answers 3

2

You can try a one-liner:

>>> from itertools import combinations
>>> [arrays for arrays in combinations([f"array_{i}" for i in range(1,6)],2) 
     if np.all(np.equal(*map(globals().get,arrays)))]

Output:

[('array_1', 'array_5')]

EXPLANATION:

>>> [f"array_{i}" for i in range(1,6)]
['array_1', 'array_2', 'array_3', 'array_4', 'array_5']

>>> list(combinations([f"array_{i}" for i in range(1,6)],2))
[('array_1', 'array_2'),
 ('array_1', 'array_3'),
 ('array_1', 'array_4'),
 ('array_1', 'array_5'),
 ('array_2', 'array_3'),
 ('array_2', 'array_4'),
 ('array_2', 'array_5'),
 ('array_3', 'array_4'),
 ('array_3', 'array_5'),
 ('array_4', 'array_5')]

Now it iterates through the combinations,

If we take the first element, i.e. the first iteration, rest of the steps will look like:

>>> [*map(globals().get, ('array_1', 'array_2'))]
[[1, 2, 3], [4, 5, 6]]

>>> np.all(np.equal([1, 2, 3], [4, 5, 6]))
False

EDIT:

If inside a function then try:

def bar():
    array_1 = [1, 2, 3]
    array_2 = [4, 5, 6]
    array_3 = [7, 8, 9]
    array_4 = [10, 11, 12]
    array_5 = [1, 2, 3]
    scope = locals()
    return [arrays for arrays in combinations([f"array_{i}" for i in range(1,6)],2) 
     if np.all(eval(arrays[0],scope) == eval(arrays[1],scope))]
Sign up to request clarification or add additional context in comments.

7 Comments

When I try using this code, it returns all of the array combinations. I'm not sure why it isn't just showing me the (array_1,array_4) pair.
Are you sure you didn't miss the if np.all(np.equal(*map(globals().get,arrays))) part?
Yeah, I got that part. It still returns all of the arrays. I even copied and pasted your code snippet just to be sure. It returns the right array pair for you? Could it have something to do with the "global" aspect? Maybe the arrays in my actual code are not global. What can you use instead if that's the case?
Ah, it could. Though I would suspect in that case the code would throw an error. You can use np.all(np.equal(*map(eval,arrays))) instead of np.all(np.equal(*map(globals().get,arrays)))
Keep in mind eval should generally be avoided. So if you are doing this inside a function or something, consider using locals() instead of globals() first.
|
0

You can do that like this:

import numpy as np

array_1 = [1, 2, 3]
array_2 = [4, 5, 6]
array_3 = [7, 8, 9]
array_4 = [10, 11, 12]
array_5 = [1, 2, 3]

# Put all arrays together
all_arrays = np.stack([array_1, array_2, array_3, array_4, array_5])
# Compare all vs all
c = np.all(all_arrays[:, np.newaxis] == all_arrays, axis=-1)
# Take only half the result to avoid self results and symmetric results
c = np.triu(c, 1)
# Get matching pairs
m = np.stack(np.where(c), axis=1)
# One row per matching pair
print(m)
# [[0 4]]

This makes more comparisons than necessary, though (e.g. array_1 vs array_2 and array_2 vs array_1). You can also use something like scipy.spatial.distance.pdist to potentially save some time:

import numpy as np
import scipy.spatial.distance

array_1 = [1, 2, 3]
array_2 = [4, 5, 6]
array_3 = [7, 8, 9]
array_4 = [10, 11, 12]
array_5 = [1, 2, 3]

# Put all arrays together
all_arrays = np.stack([array_1, array_2, array_3, array_4, array_5])
# Compute pairwise distances
d = scipy.spatial.distance.pdist(all_arrays, 'hamming')
d = scipy.spatial.distance.squareform(d)
# Get indices of pairs where it is zero
c = np.triu(d == 0, 1)
m = np.stack(np.where(c), axis=1)
print(m)
# [[0 4]]

Comments

0

You can use the .count() method to validate if in the array are more than one ocurrence of an array:

def compare(*arrays):
    temp = [list(x) for x in list(arrays)]

    for i in range(len(temp)):
        if temp.count(temp[i]) > 1:
            return (i,temp[i + 1:].index(temp[i]) + 1)
        else:
            return False

The fisrst line of the function generates a list of all the array used like arguments casted to list type. If in the list there are more than one i (actual iteration value), will return i and the index of the another identic array. The function needs to return this index of the another identic array with the method .index() in a range of a list without the actual i.

print(compare(array_1,array_2,array_3,array_4,array_5))

will return

(0, 4)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.