Find the index of the k smallest values of a numpy array

Question

In order to find the index of the smallest value, we can use argmin:

import numpy as np
A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
print(A.argmin())     # 4 because A[4] = 0.1

But how can I find the indices of the k-smallest values?

I'm looking for something like:

A.argmin(numberofvalues=3)   
# [4, 0, 7]  because A[4] <= A[0] <= A[7] <= all other A[i]

Note: in my use case A has between ~ 10 000 and 100 000 values, and I'm interested for only the indices of the k=10 smallest values. k will never be > 10.

See this question, especially the second answer there, for the best solution to this (it's O(n) - full sorting the entire array is not absolutely necessary). — Alex Riley
– Alex Riley, Commented Dec 11, 2015 at 15:18

unutbu · Accepted Answer · 2019-03-02 01:53:01Z

206

Use np.argpartition. It does not sort the entire array. It only guarantees that the kth element is in sorted position and all smaller elements will be moved before it. Thus the first k elements will be the k-smallest elements.

import numpy as np

A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
k = 3

idx = np.argpartition(A, k)
print(idx)
# [4 0 7 3 1 2 6 5]

This returns the k-smallest values. Note that these may not be in sorted order.

print(A[idx[:k]])
# [ 0.1  1.   1.5]

To obtain the k-largest values use

idx = np.argpartition(A, -k)
# [4 0 7 3 1 2 6 5]

A[idx[-k:]]
# [  9.  17.  17.]

WARNING: Do not (re)use idx = np.argpartition(A, k); A[idx[-k:]] to obtain the k-largest. That won't always work. For example, these are NOT the 3 largest values in x:

x = np.array([100, 90, 80, 70, 60, 50, 40, 30, 20, 10, 0])
idx = np.argpartition(x, 3)
x[idx[-3:]]
array([ 70,  80, 100])

Here is a comparison against np.argsort, which also works but just sorts the entire array to get the result.

In [2]: x = np.random.randn(100000)

In [3]: %timeit idx0 = np.argsort(x)[:100]
100 loops, best of 3: 8.26 ms per loop

In [4]: %timeit idx1 = np.argpartition(x, 100)[:100]
1000 loops, best of 3: 721 µs per loop

In [5]: np.alltrue(np.sort(np.argsort(x)[:100]) == np.sort(np.argpartition(x, 100)[:100]))
Out[5]: True

edited Mar 2, 2019 at 1:53

answered Dec 11, 2015 at 15:20

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user27182 Over a year ago

Any idea how this handles ties? It seems if you want random tie break the only possible way is to use lexsort which sorts the whole array. stackoverflow.com/a/20199459/1993389 The partition docs say introselect is not stable, but I'm not sure if this means ties are broken at random.

unutbu Over a year ago

@user27182: Per the docs, if a is an array with fields (i.e. a structured array), then you can either specify an order or let the unspecified fields be used to break ties. So if you pour A into the first field of a structured array, then pour random (tie breaking) numbers into a second field, then np.argpartition could be used to select the k smallest (or largest) with random tie breaks.

Leland Hepworth Over a year ago

Keep in mind that the first k-1 elements are not guaranteed to be in order from smallest to largest. If that's something you need, you could use np.argpartition, slice the array using the first k indices, and then use np.argsort on the resulting array.

Cory Kramer · Accepted Answer · 2017-06-06 19:33:25Z

34

You can use numpy.argsort with slicing

>>> import numpy as np
>>> A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
>>> np.argsort(A)[:3]
array([4, 0, 7], dtype=int32)

edited Jun 6, 2017 at 19:33

answered Dec 11, 2015 at 15:05

Cory Kramer

119k19 gold badges176 silver badges233 bronze badges

4 Comments

Basj Over a year ago

Thanks! But wouldn't it be very slow to have to compute all the argsort and then keeping only the k (<=10) first values?

Cory Kramer Over a year ago

Hard to say without knowing the implementation of argsort. Specifically, if it is implemented as a generator, and depending on the actual sorting algorithm it could potentially be lazy, or it could sort the entire set first I'm not sure.

Cory Kramer Over a year ago

From the other comments, it does seem that argsort sorts the entire set, so I would prefer one of the other suggested solutions that use argpartition

twink_ml Over a year ago

This nice thing about this solution (compared to argpartition) is that we can guarantee that the k indices we are looking for are in ascending order.

Jeremiah England · Accepted Answer · 2018-07-25 15:58:28Z

6

For n-dimentional arrays, this function works well. The indecies are returned in a callable form. If you want a list of the indices to be returned, then you need to transpose the array before you make a list.

To retrieve the k largest, simply pass in -k.

def get_indices_of_k_smallest(arr, k):
    idx = np.argpartition(arr.ravel(), k)
    return tuple(np.array(np.unravel_index(idx, arr.shape))[:, range(min(k, 0), max(k, 0))])
    # if you want it in a list of indices . . . 
    # return np.array(np.unravel_index(idx, arr.shape))[:, range(k)].transpose().tolist()

Example:

r = np.random.RandomState(1234)
arr = r.randint(1, 1000, 2 * 4 * 6).reshape(2, 4, 6)

indices = get_indices_of_k_smallest(arr, 4)
indices
# (array([1, 0, 0, 1], dtype=int64),
#  array([3, 2, 0, 1], dtype=int64),
#  array([3, 0, 3, 3], dtype=int64))

arr[indices]
# array([ 4, 31, 54, 77])

%%timeit
get_indices_of_k_smallest(arr, 4)
# 17.1 µs ± 651 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

answered Jul 25, 2018 at 15:58

Jeremiah England

6729 silver badges17 bronze badges

1 Comment

emil Over a year ago

I was looking for an answer for the n-dimensional cases, and your's worked as expected! Additionally, it's a fast solution.

Marcelo Villa · Accepted Answer · 2017-12-03 01:25:36Z

0

numpy.partition(your_array, k) is an alternative. No slicing necessary as it gives the values sorted until the kth element.

answered Dec 3, 2017 at 1:25

Marcelo Villa

1,1312 gold badges14 silver badges26 bronze badges

2 Comments

twink_ml Over a year ago

This puts the element at index k (possibly unsorted) in sorted position. Since the sorted position is not necessary index k or k-1, we cannot guarantee that your_array[:k] contains the k smallest elements after numpy.partition.

Alex Gaudio Over a year ago

This is the best answer for values of the array (not indexes) in Oct. 2019. @protagonist I don't understand your comment. Please correct me if wrong, but evidence that this partition function works correctly is to run the following in a loop: y = np.arange(10) ; np.random.shuffle(y) ; y.partition(3) ; assert y[:3+1].max() < y[3+1:].min() ... did the partition function have different behavior in old numpy version or something? Also FYI: k is zero-indexed.

Ganymede_0g · Accepted Answer · 2024-07-26 13:46:58Z

0

As I needed the indices sorted, I adapted the code provided by Jeremiah England above, taking some of my inspiration from this answer as well. Use of this function is exactly as described by Mr. England.

def get_indices_of_k_smallest(arr, k):
    idx = np.argpartition(arr.ravel(), k)
    ind = tuple(np.array(np.unravel_index(idx, arr.shape))[:, range(min(k, 0), max(k, 0))])
    i2 = np.argsort(arr[ind])
    return tuple(np.array(ind).transpose()[i2].transpose()[:, range(min(k, 0), max(k, 0))])
    #as a list of indices:
    #return np.array(ind).transpose()[i2]

answered Jul 26, 2024 at 13:46

Ganymede_0g

114 bronze badges

Collectives™ on Stack Overflow

Find the index of the k smallest values of a numpy array

5 Answers 5

3 Comments

4 Comments

1 Comment

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

4 Comments

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related