Python array, how to select a block of columns and rows after a certain distance without loop

Question

I would like to select certain columns and rows from a big 2D array. For example, I want to select N = 64 columns after every D = 128 columns, if my big array were to have shape (384,384), this would result to a smaller (256, 256) matrix, essentially because I want to remove redundant data from the big matrix.

My code looks like below, the problem is that I don't know how to avoid the explicit indexing(here 4 times in each direction, actually can be implemented as a loop with generic size) in a nice way without using loops if possible. Also in this example I start selection from 0 column, in general it can be started from arbitrary column.

row_mask = np.zeros(rows, dtype=bool)  # e.g. rows = 384
col_mask = np.zeros(cols, dtype=bool)  # e.g. cols = 384

N = 64
D = 128
# explicit selection of columns and rows
row_mask[0:N] = 1
row_mask[D:D + N] = 1
row_mask[D * 2:D * 2 + N] = 1
row_mask[-N:] = 1
col_mask[0:N] = 1
col_mask[D:D + N] = 1
col_mask[D * 2:D * 2 + N] = 1
col_mask[-N:] = 1

#Image of (384, 384), image of (256, 256)
image = Image[np.ix_(row_mask, col_mask)]

Paul Panzer · Accepted Answer · 2020-07-12 03:35:41Z

2

Actually, for this example with relatively large tiles it is way more efficient to use slicing in a for loop than to avoid the for loop by means of the much more expensive fancy indexing:

from scipy.misc import face
from timeit import timeit

img = face()

def fancy():
    D,N=128,64
    r_mask = np.arange(img.shape[0]) % D < N
    c_mask = np.arange(img.shape[1]) % D < N
    return img[r_mask[:, None] & c_mask].reshape(np.count_nonzero(r_mask), np.count_nonzero(c_mask),3)

def loopy():
    di,dj=64,64
    DI,DJ=128,128
    return np.block([[[img[i:i+di,j:j+dj]] for j in range(0,img.shape[1],DJ)] for i in range(0,img.shape[0],DI)])

(fancy()==loopy()).all()
# True
timeit(loopy,number=100)*10
# 0.763049490051344
timeit(fancy,number=100)*10
# 5.845791429746896

answered Jul 12, 2020 at 3:35

Paul Panzer

53.3k3 gold badges60 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mad Physicist Over a year ago

I did not expect that, but it makes perfect sense in hindsight. Is there a crossover point? I would imagine for sufficiently many blocks?

lorniper Over a year ago

Thanks, though this answer has the same problem with the other answer posted by @MadPhysicist, basically missing N elements in each direction.

Paul Panzer Over a year ago

@MadPhysicist I would imagine there is, yes. Didn't look for it, though.

Mad Physicist · Accepted Answer · 2020-07-12 02:38:52Z

1

You can construct a totally general solution with fancy indexing using broadcasted addition and ravelling.

Let's take the one dimensional case:

arr = np.random.randint(10, size=973)

S = arr.shape[0]
N = 64
D = 128

# how many D-sized chunks?
nd = np.ceil(S / D)
# how many indices to chop from the end? I.e., which part of the last chunk doesn't fit in S?
nn = N - S + (nd - 1) * D

index = (np.arange(N) + D * np.arange(nd)[:, None]).ravel()[:-nn]
result = arr[index]

In 2D, this would look like

arr = np.random.randint(10, size=(1024, 768))

S = np.array(arr.shape)
N = 64
D = 128

nd = np.ceil(S / D)
nn = N - S + (nd - 1) * D

r_index = (np.arange(N) + D * np.arange(nd[0])[:, None]).ravel()[:-nn[0]]
c_index = (np.arange(N) + D * np.arange(nd[1])[:, None]).ravel()[:-nn[1]]
result = arr[np.ix_(r_index, c_index)]

You can extend this to N dimensions with just a little bit of broadcasting trickery, and a small list comprehension:

arr = np.random.randint(10, size=(128, 200, 64))

S = np.array(arr.shape)
N = 64  # Could be array with different value for each dimension
D = 128 # Same with this

nd = np.ceil(S / D)
nn = N - S + (nd - 1) * D

You will likely end up with a ragged array of indices for the whole thing, so it would be wise to switch to a list:

index = [(np.arange(N) + D * np.arange(ndx)[:, None]).ravel()[:-nnx] for ndx, nnx in zip(nd, nn)]
result = arr[np.ix_(*index)]

edited Jul 12, 2020 at 2:38

answered Jul 12, 2020 at 2:16

Mad Physicist

116k29 gold badges202 silver badges292 bronze badges

3 Comments

lorniper Over a year ago

Thanks, actually I would appreciate if you can provide the generalization to the case where it doesn't start from 0.

Mad Physicist Over a year ago

@lorniper. You would add the offset to the index (trivial), and compute nd and nn to account for the difference (easy). It would be much simpler to use my other answer though.

lorniper Over a year ago

Thanks, I'm trying to do this based on your other answers right now.

V. Ayrat · Accepted Answer · 2020-07-11 16:40:20Z

0

You can add np.arange(N) on each value of [0, D, ...] and then unite it with [-N:] part.

import numpy as np

N = 64
D = 128
shape = (384, 384)
axis = 0
rows = np.union1d(
    np.arange(shape[axis] - N, shape[axis]),
    np.add.outer(np.arange(0, shape[axis], D), np.arange(N)).ravel(),
)
axis = 1
cols = np.union1d(
    np.arange(shape[axis] - N, shape[axis]),
    np.add.outer(np.arange(0, shape[axis], D), np.arange(N)).ravel(),
)
image = Image[np.ix_(rows, cols)]

answered Jul 11, 2020 at 16:40

V. Ayrat

2,74912 silver badges10 bronze badges

Comments

Mad Physicist · Accepted Answer · 2020-07-12 14:46:12Z

0

Probably the simplest method to avoid loops would be to use the modulo operator:

img = ...
r_mask = (np.arange(img.shape[0] % D < N)
c_mask = (np.arange(img.shape[0] % D < N)
result = img[r_mask[:, None] & c_mask].reshape(np.count_nonzero(r_mask), np.count_nonzero(c_mask)]

Or in your original notation:

result = img[np.ix_(r_mask, c_mask)]

Each half of the mask is an array matched with the appropriate dimension of img that sets the first N elements of each D-sized chunk to True and the rest to False. Broadcasting ensures that the two halves are combined into a mask with the same dimensions as img.

This method generalizes fairly well across arbitrary dimensions, although you will have to run a loop in that case:

mask = np.ones(arr.shape, dtype=bool)
dims = np.empty(arr.ndim)
for i, k in enumerate (mask.shape[::-1]):
    m = (np.arange(k) % D < N)
    mask &= np.expand_dims(m, np.arange(i))
    dims[i] = np.count_nonzero(m)
result = arr[mask].reshape(dims[::-1])

edited Jul 12, 2020 at 14:46

answered Jul 12, 2020 at 2:24

Mad Physicist

116k29 gold badges202 silver badges292 bronze badges

1 Comment

Mad Physicist Over a year ago

@lorniper. If you want to add the tail, just set r_mask[-n:] = True

Gabriel Nunes · Accepted Answer · 2020-07-11 19:09:01Z

-1

Assuming that each row in your table has 384 columns, you can use a for loop:

for row in table:
    row = row[:64] + row[192:256]

answered Jul 11, 2020 at 19:09

Gabriel Nunes

413 bronze badges

1 Comment

Mad Physicist Over a year ago

The title literally says "without loop"

Collectives™ on Stack Overflow

Python array, how to select a block of columns and rows after a certain distance without loop

5 Answers 5

3 Comments

3 Comments

Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

3 Comments

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related