1

For example, given matrix

array([[ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [ 0,  1,  2,  3,  4,  5],
       [24, 25, 26, 27, 28, 29]])

and top_n=3, it should return

array([[24, 25, 26, 27, 28, 29],
       [18, 19, 20, 21, 22, 23],
       [12, 13, 14, 15, 16, 17]])

This function should return a np.ndarray of shape (top_n, arr.shape[-1]), given the input 2D matrix arr.

Here's what I tried:

def select_rows(arr, top_n):
    """
    This function selects the top_n rows that have the largest sum of entries
    """
    sel_rows = np.argsort(-arr,axis=1)[:top_n]
    
    return sel_rows

I also tried:

sel_rows = (-arr).argsort(axis=-1)[:, :top_n]

to no avail.

1
  • Casting the array to negative with - is less efficient that slicing the data at the end. For the small sample this isn't an issue, but casting all the values to negative in a large array will be somewhat slower, which is verified with a %%timeit test. Commented Sep 1, 2021 at 5:04

3 Answers 3

5

You can use this simple 1-liner a[np.argsort(a.sum(axis=1))[:-top_n-1:-1]]

a.sum(axis=1) sums along axis 1

np.argsort(..., axis=0) argsorts along axis 0 (axis=0 is default option anyway so could be omitted)

...[:-top_n-1:-1] picks the last top_n indices in reverse order

a[...] then grabs the rows

%%timeit comparison

# data sample
a = np.random.randint(0, 101, (100000, 1000))

%%timeit
a[np.argsort(a.sum(axis=1))[:-3-1:-1]]
[out]:
9.73 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
a[np.argsort(-a.sum(axis=1))[:3]]
[out]:
9.9 ms ± 303 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
sorted(a, key=lambda x: sum(x))[:-3-1:-1]
[out]:
1.04 s ± 36.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Sign up to request clarification or add additional context in comments.

Comments

3

Your code almost works, but you need to compute the sum of each row before sorting. You can try this:

import numpy as np


top_n = 3
arr = np.array([[ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [ 0,  1,  2,  3,  4,  5],
       [24, 25, 26, 27, 28, 29]])

arr[np.argsort(-arr.sum(axis=1))[:top_n]]

It gives:

array([[24, 25, 26, 27, 28, 29],
       [18, 19, 20, 21, 22, 23],
       [12, 13, 14, 15, 16, 17]])

1 Comment

The answer should explain that the purpose of the - is to reverse the order
0

Without numpy, you can use the built-in function sorted combined with argument key:

sorted(A, key=lambda x: sum(x))[:-top_n-1:-1]

1 Comment

This implementation is highly inefficient and should not be used with numpy arrays. For an array, np.random.randint(0, 101, (100000, 100)), this is 107 times slower than the numpy implementations.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.