Efficient way to take the minimum/maximum n values and indices from a matrix using NumPy

Question

What's an efficient way, given a NumPy matrix (2D array), to return the minimum/maximum n values (along with their indices) in the array?

Currently I have:

def n_max(arr, n):
    res = [(0,(0,0))]*n
    for y in xrange(len(arr)):
        for x in xrange(len(arr[y])):
            val = float(arr[y,x])
            el = (val,(y,x))
            i = bisect.bisect(res, el)
            if i > 0:
                res.insert(i, el)
                del res[0]
    return res

This takes three times longer than the image template matching algorithm that pyopencv does to generate the array I want to run this on, and I figure that's silly.

@Paul: tiny.. i'm finding the number of matches of a template to an image, so it's # of matches to # of pixels in the image, like 20 to 150000 — Claudiu
– Claudiu, Commented Apr 27, 2011 at 16:38

user2357112 · Accepted Answer · 2019-03-16 05:44:00Z

27

Since the time of the other answer, NumPy has added the numpy.partition and numpy.argpartition functions for partial sorting, allowing you to do this in O(arr.size) time, or O(arr.size+n*log(n)) if you need the elements in sorted order.

numpy.partition(arr, n) returns an array the size of arr where the nth element is what it would be if the array were sorted. All smaller elements come before that element and all greater elements come afterward.

numpy.argpartition is to numpy.partition as numpy.argsort is to numpy.sort.

Here's how you would use these functions to find the indices of the minimum n elements of a two-dimensional arr:

flat_indices = numpy.argpartition(arr.ravel(), n-1)[:n]
row_indices, col_indices = numpy.unravel_index(flat_indices, arr.shape)

And if you need the indices in order, so row_indices[0] is the row of the minimum element instead of just one of the n minimum elements:

min_elements = arr[row_indices, col_indices]
min_elements_order = numpy.argsort(min_elements)
row_indices, col_indices = row_indices[min_elements_order], col_indices[min_elements_order]

The 1D case is a lot simpler:

# Unordered:
indices = numpy.argpartition(arr, n-1)[:n]

# Extra code if you need the indices in order:
min_elements = arr[indices]
min_elements_order = numpy.argsort(min_elements)
ordered_indices = indices[min_elements_order]

edited Mar 16, 2019 at 5:44

answered Aug 4, 2016 at 16:18

user2357112

286k32 gold badges490 silver badges571 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Wayne Filkins Over a year ago

This exact code gave me ValueError: not enough values to unpack (expected 2, got 1)

user2357112 Over a year ago

@WayneFilkins: Sounds like you tried to use it on a 1D array instead of a 2D array. The 1D case is simpler, but you can't throw the 2D case code at a 1D array.

Michel Lemay Over a year ago

This is the fastest solution I've found so far. However, argpartition requires O(arr.size) storage! I'm surprised that nobody came up with a better solution requiring O(n) temporary storage instead.

user2357112 Over a year ago

@MichelLemay: heapq.nlargest and heapq.nsmallest exist, but have worse time complexity and don't take advantage of NumPy functionality.

Michel Lemay Over a year ago

yes, I've tried them and they are too slow. In my case, the array is multidimensional and contains millions of columns. Allocating that takes a significant amount of time to just throw away 99.99% of it to the garbage collector.

|

Peter Mortensen · Accepted Answer · 2018-06-28 02:46:51Z

9

Since there is no heap implementation in NumPy, probably your best guess is to sort the whole array and take the last n elements:

def n_max(arr, n):
    indices = arr.ravel().argsort()[-n:]
    indices = (numpy.unravel_index(i, arr.shape) for i in indices)
    return [(arr[i], i) for i in indices]

(This will probably return the list in reverse order compared to your implementation - I did not check.)

A more efficient solution that works with newer versions of NumPy is given in this answer.

edited Jun 28, 2018 at 2:46

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Apr 27, 2011 at 16:14

Sven Marnach

608k123 gold badges968 silver badges865 bronze badges

10 Comments

Paul Over a year ago

if n is small then perhaps running argmax a few times (removing the max each time) could be faster.

Voo Over a year ago

No expert with NumPy, but do we really need to sort (O(n log n)) for something which is trivially done in O(n)? I assume the advantage is that the sorting is done in C while the looping code is run by the python interpreter?

Sven Marnach Over a year ago

@Voo: The complexity of the OP's algorithm is O(m log n), where m is the number of elements in the array and n is the number of highest elements to find. The algorithm in my answer is O(m log m). The factor between these two complexities for m and n as in the OP's above comment is 4, which is more than compensated for by getting rid of the Python loops. As Paul noted above, if n is really small, there might be better alternatives.

Claudiu Over a year ago

@Voo: yea complexity isn't everything. in this case having this done in C beats mine by a lot (~3x faster) - and by enough so that i no longer have to worry about it, though if i need something faster i'll come back for more. but - how would you trivially do it on O(n)?

user2357112 Over a year ago

NumPy has numpy.partition and numpy.argpartition, which would let you do this in O(arr.size), or O(arr.size+n*log(n)) if you need the n items in order.

|

Arthur Chan · Accepted Answer · 2020-09-14 03:09:46Z

I just met the exact same problem and solved it.
Here is my solution, wrapping the np.argpartition:

Applied to arbitrary axis.
High speed when K << array.shape[axis], o(N).
Return both the sorted result and the corresponding indexs in original matrix.

def get_sorted_smallest_K(array, K, axis=-1):
    # Find the least K values of array along the given axis. 
    # Only efficient when K << array.shape[axis].
    # Return:
    #   top_sorted_scores: np.array. The least K values.
    #   top_sorted_indexs: np.array. The least K indexs of original input array.
    
    partition_index = np.take(np.argpartition(array, K, axis), range(0, K), axis)
    top_scores = np.take_along_axis(array, partition_index, axis)
    sorted_index = np.argsort(top_scores, axis=axis)
    top_sorted_scores = np.take_along_axis(top_scores, sorted_index, axis)
    top_sorted_indexs = np.take_along_axis(partition_index, sorted_index, axis)
    return top_sorted_scores, top_sorted_indexs

Collectives™ on Stack Overflow

Efficient way to take the minimum/maximum n values and indices from a matrix using NumPy

3 Answers 3

7 Comments

10 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

10 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related