27

In numpy, is there a nice idiomatic way of testing if all rows are equal in a 2d array?

I can do something like

np.all([np.array_equal(M[0], M[i]) for i in xrange(1,len(M))])

This seems to mix python lists with numpy arrays which is ugly and presumably also slow.

Is there a nicer/neater way?

1
  • 1
    As I said for a similar question, this really needs a proper solution that does not create a temporary array as large as the original (which both of the answers here as well as there do). I will post an answer once I have made the addition to numpy. Commented Apr 7, 2016 at 21:01

4 Answers 4

33

One way is to check that every row of the array arr is equal to its first row arr[0]:

(arr == arr[0]).all()

Using equality == is fine for integer values, but if arr contains floating point values you could use np.isclose instead to check for equality within a given tolerance:

np.isclose(a, a[0]).all()

If your array contains NaN and you want to avoid the tricky NaN != NaN issue, you could combine this approach with np.isnan:

(np.isclose(a, a[0]) | np.isnan(a)).all()
Sign up to request clarification or add additional context in comments.

7 Comments

I think this is the fastest way. Thank you.
Checking for equality, rather than equality to 0 of the difference, is likely to be a tad faster.
There is now np.allclose.
Also note that np.isclose and np.allclose (now) allow for NaN equivalence checking (equal_nan parameter).
Could be worth doing (arr[1:] == arr[0]).all() to avoid wastefully comparing row 0 against itself. Like the original solution, this only requires that arr has at least one row. I think it would be better if it didn't even require that, which you can do with: arr.size == 0 or (arr[1:]==arr[0]).all()
|
6

It is worth mentioning that the above version will not work for multidimensional arrays.

For example: for a three-dimensional square image tensor img [256, 256, 3] , we need to check whether the same RGB [256, 256] layers in the image or not. In this case, we need to use broadcasting

(img == img[:, :, 0, np.newaxis]).all()

Because simple img[:, :, 0] gives us [256, 256], but we need [256, 256, 1] to broadcast through layers.

Comments

5

Simply check if the number if unique items in the array are 1:

>>> arr = np.array([[1]*10 for _ in xrange(5)])
>>> len(np.unique(arr)) == 1
True

A solution inspired from unutbu's answer:

>>> arr = np.array([[1]*10 for _ in xrange(5)])
>>> np.all(np.all(arr == arr[0,:], axis = 1))
True

One problem with your code is that you're creating an entire list first before applying np.all() on it. Due to that there's no short-circuiting happening in your version, instead of that it would be better if you use Python's all() with a generator expression:

Timing comparisons:

>>> M = arr = np.array([[3]*100] + [[2]*100 for _ in xrange(1000)])
>>> %timeit np.all(np.all(arr == arr[0,:], axis = 1))
1000 loops, best of 3: 272 µs per loop
>>> %timeit (np.diff(M, axis=0) == 0).all()
1000 loops, best of 3: 596 µs per loop
>>> %timeit np.all([np.array_equal(M[0], M[i]) for i in xrange(1,len(M))])
100 loops, best of 3: 10.6 ms per loop
>>> %timeit all(np.array_equal(M[0], M[i]) for i in xrange(1,len(M)))
100000 loops, best of 3: 11.3 µs per loop

>>> M = arr = np.array([[2]*100 for _ in xrange(1000)])
>>> %timeit np.all(np.all(arr == arr[0,:], axis = 1))
1000 loops, best of 3: 330 µs per loop
>>> %timeit (np.diff(M, axis=0) == 0).all()
1000 loops, best of 3: 594 µs per loop
>>> %timeit np.all([np.array_equal(M[0], M[i]) for i in xrange(1,len(M))])
100 loops, best of 3: 9.51 ms per loop
>>> %timeit all(np.array_equal(M[0], M[i]) for i in xrange(1,len(M)))
100 loops, best of 3: 9.44 ms per loop

6 Comments

I think ajcr's answer is even faster!
@user2179021 It is taking 650 µs on my system, so still slower than my second answer.
Agreed - the second method is faster than mine on my system too.
Do you think there is a similarly fast way to tell if all the rows are distinct?
Your first method is actually checking if all items of the array are the same, not if all rows are the same... Try it with np.array([[1,2,3],[1,2,3]]).
|
1

For Alex's answer about nan, we have now,

np.isclose([1.0, np.nan], [1.0, np.nan], equal_nan=True)
np.allclose([1.0, np.nan], [1.0, np.nan], equal_nan=True)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.