Sorting by subsampling every nth element in numpy array?

Question

I am trying to sample every nth element to sort an array. My current solution works, but it feels like there should be a solution that does not involve concatenation.

My current implementation is as follows.

arr = np.arange(10)
print(arr)
[0 1 2 3 4 5 6 7 8 9]

# sample every 5th element
res = np.empty(shape=0)
for i in range(5):
    res = np.concatenate([res, arr[i::5]])
    
print(res)
[0. 5. 1. 6. 2. 7. 3. 8. 4. 9.]

Looking for any tips to make this faster/more pythonic. My use case is with an array of ~10,000 values.

For your ~10,000 values, is it still "every 5th", or something else? — Kelly Bundy
– Kelly Bundy, Commented Feb 11, 2023 at 21:16
I see you accepted seralouk's answer. Does that not give you an error? — Kelly Bundy
– Kelly Bundy, Commented Feb 11, 2023 at 21:33
Apologies, I should have tested the solution after fixing the error. Thanks for your comprehensive answer and the speed comparison! I also updated the accepted answer :) — BoomBoxBoy
– BoomBoxBoy, Commented Feb 11, 2023 at 23:13

Brian61354270 · Accepted Answer · 2023-02-11 23:23:20Z

4

Reshape your vector into in a 2D array with N elements per row, and then flatten it column-wise:

import numpy as np

# Pick "subsample stride"
N = 5

# Create a vector with length divisible by N.
arr = np.arange(2 * N)
print(arr)

# Reshape arr into a 2D array with N elements per row and however many
# columns required. 
# Flatten it with "F" ordering for "Fortran style" (column-major).
output = arr.reshape(-1, N).flatten("F")
print(output)

outputs

[0 1 2 3 4 5 6 7 8 9]
[0 5 1 6 2 7 3 8 4 9]

Performance comparison

Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np

In [2]: def sol0(arr):
   ...:     """OP's original solution."""
   ...:     res = np.empty(shape=0)
   ...:     for i in range(5):
   ...:         res = np.concatenate([res, arr[i::5]])
   ...:     return res
   ...: 

In [3]: def sol1(arr):  
   ...:     """This answer's solution."""
   ...:     return arr.reshape(-1, 5).flatten("F")
   ...: 

In [4]: def sol2(arr):
   ...:     """@seralouk's solution, with shape error patch"""
   ...:     res = np.empty((5, arr.size//5), order='F')
   ...:     for i in range(5):
   ...:         res[i::5] = arr[i::5]
   ...:     return res.reshape(-1)

In [5]: arr = np.arange(10_000)

In [6]: %timeit sol0(arr)
26.6 µs ± 724 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [7]: %timeit sol1(arr)
7.81 µs ± 34 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [8]: %timeit sol2(arr)
36.3 µs ± 841 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

edited Feb 11, 2023 at 23:23

answered Feb 11, 2023 at 20:47

Brian61354270

14.9k7 gold badges37 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

seralouk Over a year ago

this is the slowest possible solution. see my answer above

Brian61354270 Over a year ago

@seralouk I've added some timing measurements to my answer. Are you able to reproduce these results?

Collectives™ on Stack Overflow

Sorting by subsampling every nth element in numpy array?

1 Answer 1

Performance comparison

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Performance comparison

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related