1

I have an array of data that I want to sample in order to do first-order analysis more quickly. If I want to examine one point at a time, I can do it with slicing:

import numpy as np
array = np.arange(0,20000)
samplespace = 5000
sampledarray = array[::samplespace]

However, I need to analyze regions several elements wide. The only way I can think of doing this is with a for-loop:

import numpy as np
array = np.arange(0,20000)
samplespace = 5000
n = 3
sampledarray = array[0::samplespace]
for i in range(1,n):
    arraysample_i = array[i::samplespace]
    indices = np.linspace(i,len(sampledarray),len(arraysample_i)).astype(int)
    sampledarray = np.insert(sampledarray,indices,arraysample_i)
print(sampledarray)
>>> [ 0 1 2 5000 5001 5002 10000 10001 10002 15000 15001 15002 ]

I'm worried about the loop taking a lot of time to run if I do it with a large array and am sampling in multiple dimensions. Is there a simpler, faster way to do this?

3 Answers 3

1
import numpy as np
import itertools

array = np.arange(0,20000)
samplespace = 5000

indices = itertools.chain.from_iterable((ind for ind in range(i, len(array), samplespace) )for i in range(n))


output = array[list(indices)]

output:

array([    0,  5000, 10000, 15000,     1,  5001, 10001, 15001,     2,
        5002, 10002, 15002])
Sign up to request clarification or add additional context in comments.

Comments

1

Thanks to aminrd for suggesting using itertools; that's a package I haven't used before but ended up giving me what I needed.

Here's what I ended up doing:

import numpy as np
import itertools
array = np.arange(0,20000)
samplespace = 5000
n = 3

iterate = itertools.count(start=0,step=samplespace)
num = int(len(array)/samplespace)
idx = np.array([next(iterate) for _ in range(num)])
idxlist = np.zeros(0)
for i in range(n):
    idxi = np.copy(idx)+i
    idxlist = np.append(idxlist,idxi)
idxlist = np.sort(idxlist).astype(int)

sampledarray = array[idxlist]
print(sampledarray)
>>> [ 0 1 2 5000 5001 5002 10000 10001 10002 15000 15001 15002 ]

This allowed me to extend it into more dimensions fairly easily, and rather than working with my large dataset I only had to manipulate the array of indices:

import numpy as np
import itertools
array = np.empty((200,200),dtype=object)
# I know this is a lousy way to define the array but it works well for illustrative purposes
for i in range(200):
    for j in range(200):
        array[i,j] = (i,j)
samplespace = 50
n = 3

iteratex = itertools.count(start=0,step=samplespace)
iteratey = itertools.count(start=0,step=samplespace)
num = int(len(array)/samplespace)
idxx = np.array([next(iteratex) for _ in range(num)])
idxy = np.array([next(iteratey) for _ in range(num)])
idxlistx = np.zeros(0)
idxlisty = np.zeros(0)
for i in range(n):
    idxxi = np.copy(idxx)+i
    idxyi = np.copy(idxy)+i
    idxlistx = np.append(idxlistx,idxxi)
    idxlisty = np.append(idxlisty,idxyi)
idxlistx = np.sort(idxlistx).astype(int)
idxlisty = np.sort(idxlisty).astype(int)

# Having to index the array twice seems awkward, even though I understand it is necessary 
# for array broadcasting if the two index arrays are of different lengths
sampledarray = array[idxlistx,:]
sampledarray = sampledarray[:,idxlisty]
print(sampledarray)

>>>[[(0, 0) (0, 1) (0, 2) (0, 50) (0, 51) (0, 52) (0, 100) (0, 101) (0, 102)
  (0, 150) (0, 151) (0, 152)]
 [(1, 0) (1, 1) (1, 2) (1, 50) (1, 51) (1, 52) (1, 100) (1, 101) (1, 102)
  (1, 150) (1, 151) (1, 152)]
 [(2, 0) (2, 1) (2, 2) (2, 50) (2, 51) (2, 52) (2, 100) (2, 101) (2, 102)
  (2, 150) (2, 151) (2, 152)]
 [(50, 0) (50, 1) (50, 2) (50, 50) (50, 51) (50, 52) (50, 100) (50, 101)
  (50, 102) (50, 150) (50, 151) (50, 152)]
 [(51, 0) (51, 1) (51, 2) (51, 50) (51, 51) (51, 52) (51, 100) (51, 101)
  (51, 102) (51, 150) (51, 151) (51, 152)]
 [(52, 0) (52, 1) (52, 2) (52, 50) (52, 51) (52, 52) (52, 100) (52, 101)
  (52, 102) (52, 150) (52, 151) (52, 152)]
 [(100, 0) (100, 1) (100, 2) (100, 50) (100, 51) (100, 52) (100, 100)
  (100, 101) (100, 102) (100, 150) (100, 151) (100, 152)]
 [(101, 0) (101, 1) (101, 2) (101, 50) (101, 51) (101, 52) (101, 100)
  (101, 101) (101, 102) (101, 150) (101, 151) (101, 152)]
 [(102, 0) (102, 1) (102, 2) (102, 50) (102, 51) (102, 52) (102, 100)
  (102, 101) (102, 102) (102, 150) (102, 151) (102, 152)]
 [(150, 0) (150, 1) (150, 2) (150, 50) (150, 51) (150, 52) (150, 100)
  (150, 101) (150, 102) (150, 150) (150, 151) (150, 152)]
 [(151, 0) (151, 1) (151, 2) (151, 50) (151, 51) (151, 52) (151, 100)
  (151, 101) (151, 102) (151, 150) (151, 151) (151, 152)]
 [(152, 0) (152, 1) (152, 2) (152, 50) (152, 51) (152, 52) (152, 100)
  (152, 101) (152, 102) (152, 150) (152, 151) (152, 152)]]

Comments

0
import numpy as np
array = np.arange(0,20000)
samplespace = 5000
n = 3
sampledarray = np.stack([array[i::samplespace] for i in range(n)]).flatten(order='F')

output:

array([    0,     1,     2,  5000,  5001,  5002, 10000, 10001, 10002,
       15000, 15001, 15002])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.