127

In order to find the index of the smallest value, we can use argmin:

import numpy as np
A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
print(A.argmin())     # 4 because A[4] = 0.1

But how can I find the indices of the k-smallest values?

I'm looking for something like:

A.argmin(numberofvalues=3)   
# [4, 0, 7]  because A[4] <= A[0] <= A[7] <= all other A[i]

Note: in my use case A has between ~ 10 000 and 100 000 values, and I'm interested for only the indices of the k=10 smallest values. k will never be > 10.

2
  • 2
    See this question, especially the second answer there, for the best solution to this (it's O(n) - full sorting the entire array is not absolutely necessary). Commented Dec 11, 2015 at 15:18
  • Similar: stackoverflow.com/questions/33623184 Commented Mar 1, 2021 at 14:18

5 Answers 5

206

Use np.argpartition. It does not sort the entire array. It only guarantees that the kth element is in sorted position and all smaller elements will be moved before it. Thus the first k elements will be the k-smallest elements.

import numpy as np

A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
k = 3

idx = np.argpartition(A, k)
print(idx)
# [4 0 7 3 1 2 6 5]

This returns the k-smallest values. Note that these may not be in sorted order.

print(A[idx[:k]])
# [ 0.1  1.   1.5]

To obtain the k-largest values use

idx = np.argpartition(A, -k)
# [4 0 7 3 1 2 6 5]

A[idx[-k:]]
# [  9.  17.  17.]

WARNING: Do not (re)use idx = np.argpartition(A, k); A[idx[-k:]] to obtain the k-largest. That won't always work. For example, these are NOT the 3 largest values in x:

x = np.array([100, 90, 80, 70, 60, 50, 40, 30, 20, 10, 0])
idx = np.argpartition(x, 3)
x[idx[-3:]]
array([ 70,  80, 100])

Here is a comparison against np.argsort, which also works but just sorts the entire array to get the result.

In [2]: x = np.random.randn(100000)

In [3]: %timeit idx0 = np.argsort(x)[:100]
100 loops, best of 3: 8.26 ms per loop

In [4]: %timeit idx1 = np.argpartition(x, 100)[:100]
1000 loops, best of 3: 721 µs per loop

In [5]: np.alltrue(np.sort(np.argsort(x)[:100]) == np.sort(np.argpartition(x, 100)[:100]))
Out[5]: True
Sign up to request clarification or add additional context in comments.

3 Comments

Any idea how this handles ties? It seems if you want random tie break the only possible way is to use lexsort which sorts the whole array. stackoverflow.com/a/20199459/1993389 The partition docs say introselect is not stable, but I'm not sure if this means ties are broken at random.
@user27182: Per the docs, if a is an array with fields (i.e. a structured array), then you can either specify an order or let the unspecified fields be used to break ties. So if you pour A into the first field of a structured array, then pour random (tie breaking) numbers into a second field, then np.argpartition could be used to select the k smallest (or largest) with random tie breaks.
Keep in mind that the first k-1 elements are not guaranteed to be in order from smallest to largest. If that's something you need, you could use np.argpartition, slice the array using the first k indices, and then use np.argsort on the resulting array.
34

You can use numpy.argsort with slicing

>>> import numpy as np
>>> A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
>>> np.argsort(A)[:3]
array([4, 0, 7], dtype=int32)

4 Comments

Thanks! But wouldn't it be very slow to have to compute all the argsort and then keeping only the k (<=10) first values?
Hard to say without knowing the implementation of argsort. Specifically, if it is implemented as a generator, and depending on the actual sorting algorithm it could potentially be lazy, or it could sort the entire set first I'm not sure.
From the other comments, it does seem that argsort sorts the entire set, so I would prefer one of the other suggested solutions that use argpartition
This nice thing about this solution (compared to argpartition) is that we can guarantee that the k indices we are looking for are in ascending order.
6

For n-dimentional arrays, this function works well. The indecies are returned in a callable form. If you want a list of the indices to be returned, then you need to transpose the array before you make a list.

To retrieve the k largest, simply pass in -k.

def get_indices_of_k_smallest(arr, k):
    idx = np.argpartition(arr.ravel(), k)
    return tuple(np.array(np.unravel_index(idx, arr.shape))[:, range(min(k, 0), max(k, 0))])
    # if you want it in a list of indices . . . 
    # return np.array(np.unravel_index(idx, arr.shape))[:, range(k)].transpose().tolist()

Example:

r = np.random.RandomState(1234)
arr = r.randint(1, 1000, 2 * 4 * 6).reshape(2, 4, 6)

indices = get_indices_of_k_smallest(arr, 4)
indices
# (array([1, 0, 0, 1], dtype=int64),
#  array([3, 2, 0, 1], dtype=int64),
#  array([3, 0, 3, 3], dtype=int64))

arr[indices]
# array([ 4, 31, 54, 77])

%%timeit
get_indices_of_k_smallest(arr, 4)
# 17.1 µs ± 651 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

1 Comment

I was looking for an answer for the n-dimensional cases, and your's worked as expected! Additionally, it's a fast solution.
0

numpy.partition(your_array, k) is an alternative. No slicing necessary as it gives the values sorted until the kth element.

2 Comments

This puts the element at index k (possibly unsorted) in sorted position. Since the sorted position is not necessary index k or k-1, we cannot guarantee that your_array[:k] contains the k smallest elements after numpy.partition.
This is the best answer for values of the array (not indexes) in Oct. 2019. @protagonist I don't understand your comment. Please correct me if wrong, but evidence that this partition function works correctly is to run the following in a loop: y = np.arange(10) ; np.random.shuffle(y) ; y.partition(3) ; assert y[:3+1].max() < y[3+1:].min() ... did the partition function have different behavior in old numpy version or something? Also FYI: k is zero-indexed.
0

As I needed the indices sorted, I adapted the code provided by Jeremiah England above, taking some of my inspiration from this answer as well. Use of this function is exactly as described by Mr. England.

def get_indices_of_k_smallest(arr, k):
    idx = np.argpartition(arr.ravel(), k)
    ind = tuple(np.array(np.unravel_index(idx, arr.shape))[:, range(min(k, 0), max(k, 0))])
    i2 = np.argsort(arr[ind])
    return tuple(np.array(ind).transpose()[i2].transpose()[:, range(min(k, 0), max(k, 0))])
    #as a list of indices:
    #return np.array(ind).transpose()[i2]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.