1

I would like to select certain columns and rows from a big 2D array. For example, I want to select N = 64 columns after every D = 128 columns, if my big array were to have shape (384,384), this would result to a smaller (256, 256) matrix, essentially because I want to remove redundant data from the big matrix.

My code looks like below, the problem is that I don't know how to avoid the explicit indexing(here 4 times in each direction, actually can be implemented as a loop with generic size) in a nice way without using loops if possible. Also in this example I start selection from 0 column, in general it can be started from arbitrary column.

row_mask = np.zeros(rows, dtype=bool)  # e.g. rows = 384
col_mask = np.zeros(cols, dtype=bool)  # e.g. cols = 384

N = 64
D = 128
# explicit selection of columns and rows
row_mask[0:N] = 1
row_mask[D:D + N] = 1
row_mask[D * 2:D * 2 + N] = 1
row_mask[-N:] = 1
col_mask[0:N] = 1
col_mask[D:D + N] = 1
col_mask[D * 2:D * 2 + N] = 1
col_mask[-N:] = 1

#Image of (384, 384), image of (256, 256)
image = Image[np.ix_(row_mask, col_mask)]

5 Answers 5

2

Actually, for this example with relatively large tiles it is way more efficient to use slicing in a for loop than to avoid the for loop by means of the much more expensive fancy indexing:

from scipy.misc import face
from timeit import timeit

img = face()

def fancy():
    D,N=128,64
    r_mask = np.arange(img.shape[0]) % D < N
    c_mask = np.arange(img.shape[1]) % D < N
    return img[r_mask[:, None] & c_mask].reshape(np.count_nonzero(r_mask), np.count_nonzero(c_mask),3)

def loopy():
    di,dj=64,64
    DI,DJ=128,128
    return np.block([[[img[i:i+di,j:j+dj]] for j in range(0,img.shape[1],DJ)] for i in range(0,img.shape[0],DI)])

(fancy()==loopy()).all()
# True
timeit(loopy,number=100)*10
# 0.763049490051344
timeit(fancy,number=100)*10
# 5.845791429746896
Sign up to request clarification or add additional context in comments.

3 Comments

I did not expect that, but it makes perfect sense in hindsight. Is there a crossover point? I would imagine for sufficiently many blocks?
Thanks, though this answer has the same problem with the other answer posted by @MadPhysicist, basically missing N elements in each direction.
@MadPhysicist I would imagine there is, yes. Didn't look for it, though.
1

You can construct a totally general solution with fancy indexing using broadcasted addition and ravelling.

Let's take the one dimensional case:

arr = np.random.randint(10, size=973)

S = arr.shape[0]
N = 64
D = 128

# how many D-sized chunks?
nd = np.ceil(S / D)
# how many indices to chop from the end? I.e., which part of the last chunk doesn't fit in S?
nn = N - S + (nd - 1) * D

index = (np.arange(N) + D * np.arange(nd)[:, None]).ravel()[:-nn]
result = arr[index]

In 2D, this would look like

arr = np.random.randint(10, size=(1024, 768))

S = np.array(arr.shape)
N = 64
D = 128

nd = np.ceil(S / D)
nn = N - S + (nd - 1) * D

r_index = (np.arange(N) + D * np.arange(nd[0])[:, None]).ravel()[:-nn[0]]
c_index = (np.arange(N) + D * np.arange(nd[1])[:, None]).ravel()[:-nn[1]]
result = arr[np.ix_(r_index, c_index)]

You can extend this to N dimensions with just a little bit of broadcasting trickery, and a small list comprehension:

arr = np.random.randint(10, size=(128, 200, 64))

S = np.array(arr.shape)
N = 64  # Could be array with different value for each dimension
D = 128 # Same with this

nd = np.ceil(S / D)
nn = N - S + (nd - 1) * D

You will likely end up with a ragged array of indices for the whole thing, so it would be wise to switch to a list:

index = [(np.arange(N) + D * np.arange(ndx)[:, None]).ravel()[:-nnx] for ndx, nnx in zip(nd, nn)]
result = arr[np.ix_(*index)]

3 Comments

Thanks, actually I would appreciate if you can provide the generalization to the case where it doesn't start from 0.
@lorniper. You would add the offset to the index (trivial), and compute nd and nn to account for the difference (easy). It would be much simpler to use my other answer though.
Thanks, I'm trying to do this based on your other answers right now.
0

You can add np.arange(N) on each value of [0, D, ...] and then unite it with [-N:] part.

import numpy as np

N = 64
D = 128
shape = (384, 384)
axis = 0
rows = np.union1d(
    np.arange(shape[axis] - N, shape[axis]),
    np.add.outer(np.arange(0, shape[axis], D), np.arange(N)).ravel(),
)
axis = 1
cols = np.union1d(
    np.arange(shape[axis] - N, shape[axis]),
    np.add.outer(np.arange(0, shape[axis], D), np.arange(N)).ravel(),
)
image = Image[np.ix_(rows, cols)]

Comments

0

Probably the simplest method to avoid loops would be to use the modulo operator:

img = ...
r_mask = (np.arange(img.shape[0] % D < N)
c_mask = (np.arange(img.shape[0] % D < N)
result = img[r_mask[:, None] & c_mask].reshape(np.count_nonzero(r_mask), np.count_nonzero(c_mask)]

Or in your original notation:

result = img[np.ix_(r_mask, c_mask)]

Each half of the mask is an array matched with the appropriate dimension of img that sets the first N elements of each D-sized chunk to True and the rest to False. Broadcasting ensures that the two halves are combined into a mask with the same dimensions as img.

This method generalizes fairly well across arbitrary dimensions, although you will have to run a loop in that case:

mask = np.ones(arr.shape, dtype=bool)
dims = np.empty(arr.ndim)
for i, k in enumerate (mask.shape[::-1]):
    m = (np.arange(k) % D < N)
    mask &= np.expand_dims(m, np.arange(i))
    dims[i] = np.count_nonzero(m)
result = arr[mask].reshape(dims[::-1])

1 Comment

@lorniper. If you want to add the tail, just set r_mask[-n:] = True
-1

Assuming that each row in your table has 384 columns, you can use a for loop:

for row in table:
    row = row[:64] + row[192:256]

1 Comment

The title literally says "without loop"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.