2

I am trying to sample every nth element to sort an array. My current solution works, but it feels like there should be a solution that does not involve concatenation.

My current implementation is as follows.

arr = np.arange(10)
print(arr)
[0 1 2 3 4 5 6 7 8 9]

# sample every 5th element
res = np.empty(shape=0)
for i in range(5):
    res = np.concatenate([res, arr[i::5]])
    
print(res)
[0. 5. 1. 6. 2. 7. 3. 8. 4. 9.]

Looking for any tips to make this faster/more pythonic. My use case is with an array of ~10,000 values.

5
  • 3
    Will the length of arr always be divisible by N? Commented Feb 11, 2023 at 20:45
  • For your ~10,000 values, is it still "every 5th", or something else? Commented Feb 11, 2023 at 21:16
  • Yes, always divisible by 5, and every 5th element. Commented Feb 11, 2023 at 21:28
  • 3
    I see you accepted seralouk's answer. Does that not give you an error? Commented Feb 11, 2023 at 21:33
  • Apologies, I should have tested the solution after fixing the error. Thanks for your comprehensive answer and the speed comparison! I also updated the accepted answer :) Commented Feb 11, 2023 at 23:13

1 Answer 1

4

Reshape your vector into in a 2D array with N elements per row, and then flatten it column-wise:

import numpy as np

# Pick "subsample stride"
N = 5

# Create a vector with length divisible by N.
arr = np.arange(2 * N)
print(arr)

# Reshape arr into a 2D array with N elements per row and however many
# columns required. 
# Flatten it with "F" ordering for "Fortran style" (column-major).
output = arr.reshape(-1, N).flatten("F")
print(output)

outputs

[0 1 2 3 4 5 6 7 8 9]
[0 5 1 6 2 7 3 8 4 9]

Performance comparison

Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np

In [2]: def sol0(arr):
   ...:     """OP's original solution."""
   ...:     res = np.empty(shape=0)
   ...:     for i in range(5):
   ...:         res = np.concatenate([res, arr[i::5]])
   ...:     return res
   ...: 

In [3]: def sol1(arr):  
   ...:     """This answer's solution."""
   ...:     return arr.reshape(-1, 5).flatten("F")
   ...: 

In [4]: def sol2(arr):
   ...:     """@seralouk's solution, with shape error patch"""
   ...:     res = np.empty((5, arr.size//5), order='F')
   ...:     for i in range(5):
   ...:         res[i::5] = arr[i::5]
   ...:     return res.reshape(-1)

In [5]: arr = np.arange(10_000)

In [6]: %timeit sol0(arr)
26.6 µs ± 724 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [7]: %timeit sol1(arr)
7.81 µs ± 34 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [8]: %timeit sol2(arr)
36.3 µs ± 841 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Sign up to request clarification or add additional context in comments.

2 Comments

this is the slowest possible solution. see my answer above
@seralouk I've added some timing measurements to my answer. Are you able to reproduce these results?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.