Is there a NumPy way to sample an array several elements at a time?

Question

I have an array of data that I want to sample in order to do first-order analysis more quickly. If I want to examine one point at a time, I can do it with slicing:

import numpy as np
array = np.arange(0,20000)
samplespace = 5000
sampledarray = array[::samplespace]

However, I need to analyze regions several elements wide. The only way I can think of doing this is with a for-loop:

import numpy as np
array = np.arange(0,20000)
samplespace = 5000
n = 3
sampledarray = array[0::samplespace]
for i in range(1,n):
    arraysample_i = array[i::samplespace]
    indices = np.linspace(i,len(sampledarray),len(arraysample_i)).astype(int)
    sampledarray = np.insert(sampledarray,indices,arraysample_i)
print(sampledarray)
>>> [ 0 1 2 5000 5001 5002 10000 10001 10002 15000 15001 15002 ]

I'm worried about the loop taking a lot of time to run if I do it with a large array and am sampling in multiple dimensions. Is there a simpler, faster way to do this?

aminrd · Accepted Answer · 2021-06-09 19:11:24Z

1

import numpy as np
import itertools

array = np.arange(0,20000)
samplespace = 5000

indices = itertools.chain.from_iterable((ind for ind in range(i, len(array), samplespace) )for i in range(n))


output = array[list(indices)]

output:

array([    0,  5000, 10000, 15000,     1,  5001, 10001, 15001,     2,
        5002, 10002, 15002])

answered Jun 9, 2021 at 19:11

aminrd

5,2605 gold badges34 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Brian · Accepted Answer · 2021-06-11 15:45:46Z

Thanks to aminrd for suggesting using itertools; that's a package I haven't used before but ended up giving me what I needed.

Here's what I ended up doing:

import numpy as np
import itertools
array = np.arange(0,20000)
samplespace = 5000
n = 3

iterate = itertools.count(start=0,step=samplespace)
num = int(len(array)/samplespace)
idx = np.array([next(iterate) for _ in range(num)])
idxlist = np.zeros(0)
for i in range(n):
    idxi = np.copy(idx)+i
    idxlist = np.append(idxlist,idxi)
idxlist = np.sort(idxlist).astype(int)

sampledarray = array[idxlist]
print(sampledarray)
>>> [ 0 1 2 5000 5001 5002 10000 10001 10002 15000 15001 15002 ]

This allowed me to extend it into more dimensions fairly easily, and rather than working with my large dataset I only had to manipulate the array of indices:

import numpy as np
import itertools
array = np.empty((200,200),dtype=object)
# I know this is a lousy way to define the array but it works well for illustrative purposes
for i in range(200):
    for j in range(200):
        array[i,j] = (i,j)
samplespace = 50
n = 3

iteratex = itertools.count(start=0,step=samplespace)
iteratey = itertools.count(start=0,step=samplespace)
num = int(len(array)/samplespace)
idxx = np.array([next(iteratex) for _ in range(num)])
idxy = np.array([next(iteratey) for _ in range(num)])
idxlistx = np.zeros(0)
idxlisty = np.zeros(0)
for i in range(n):
    idxxi = np.copy(idxx)+i
    idxyi = np.copy(idxy)+i
    idxlistx = np.append(idxlistx,idxxi)
    idxlisty = np.append(idxlisty,idxyi)
idxlistx = np.sort(idxlistx).astype(int)
idxlisty = np.sort(idxlisty).astype(int)

# Having to index the array twice seems awkward, even though I understand it is necessary 
# for array broadcasting if the two index arrays are of different lengths
sampledarray = array[idxlistx,:]
sampledarray = sampledarray[:,idxlisty]
print(sampledarray)

>>>[[(0, 0) (0, 1) (0, 2) (0, 50) (0, 51) (0, 52) (0, 100) (0, 101) (0, 102)
  (0, 150) (0, 151) (0, 152)]
 [(1, 0) (1, 1) (1, 2) (1, 50) (1, 51) (1, 52) (1, 100) (1, 101) (1, 102)
  (1, 150) (1, 151) (1, 152)]
 [(2, 0) (2, 1) (2, 2) (2, 50) (2, 51) (2, 52) (2, 100) (2, 101) (2, 102)
  (2, 150) (2, 151) (2, 152)]
 [(50, 0) (50, 1) (50, 2) (50, 50) (50, 51) (50, 52) (50, 100) (50, 101)
  (50, 102) (50, 150) (50, 151) (50, 152)]
 [(51, 0) (51, 1) (51, 2) (51, 50) (51, 51) (51, 52) (51, 100) (51, 101)
  (51, 102) (51, 150) (51, 151) (51, 152)]
 [(52, 0) (52, 1) (52, 2) (52, 50) (52, 51) (52, 52) (52, 100) (52, 101)
  (52, 102) (52, 150) (52, 151) (52, 152)]
 [(100, 0) (100, 1) (100, 2) (100, 50) (100, 51) (100, 52) (100, 100)
  (100, 101) (100, 102) (100, 150) (100, 151) (100, 152)]
 [(101, 0) (101, 1) (101, 2) (101, 50) (101, 51) (101, 52) (101, 100)
  (101, 101) (101, 102) (101, 150) (101, 151) (101, 152)]
 [(102, 0) (102, 1) (102, 2) (102, 50) (102, 51) (102, 52) (102, 100)
  (102, 101) (102, 102) (102, 150) (102, 151) (102, 152)]
 [(150, 0) (150, 1) (150, 2) (150, 50) (150, 51) (150, 52) (150, 100)
  (150, 101) (150, 102) (150, 150) (150, 151) (150, 152)]
 [(151, 0) (151, 1) (151, 2) (151, 50) (151, 51) (151, 52) (151, 100)
  (151, 101) (151, 102) (151, 150) (151, 151) (151, 152)]
 [(152, 0) (152, 1) (152, 2) (152, 50) (152, 51) (152, 52) (152, 100)
  (152, 101) (152, 102) (152, 150) (152, 151) (152, 152)]]

itamar kanter · Accepted Answer · 2021-06-10 06:29:22Z

0

import numpy as np
array = np.arange(0,20000)
samplespace = 5000
n = 3
sampledarray = np.stack([array[i::samplespace] for i in range(n)]).flatten(order='F')

output:

array([    0,     1,     2,  5000,  5001,  5002, 10000, 10001, 10002,
       15000, 15001, 15002])

answered Jun 10, 2021 at 6:29

itamar kanter

1,3963 gold badges14 silver badges31 bronze badges

Collectives™ on Stack Overflow

Is there a NumPy way to sample an array several elements at a time?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related