2

If I know the shape of a numpy array like (1000, 50), and I have an arbitrary selection expressed as an IndexExpression, let's say np.s_[:200, :], how can I evaluate the shape of the sliced array (in this example (200, 50)), without actually constructing the array, applying the slice and checking the resulting shape?

Worth noting that IndexExpressions are often slices, but not necessarily, for example:

>>> np.s_[None, 1:2:3, ..., 0]
(None, slice(1, 2, 3), Ellipsis, 0)

Finally, a bit of context for this: I am actually dealing with an h5py Dataset where I can query the shape before actually allocating an array or reading anything from file. I'm going to read a subset of the data (selected with an IndexExpression) and want to know in advance what the shape is going to be.

3
  • 1
    With a modest understanding of numpy indexing, you should be able to deduce what the shape will be. But I don't know of any numpy tool that will do it for you. To test your deductions you could construct a smaller 'dummy' array (e.g. (100,10)), index it, and extrapolate the result to the real target size. Commented Oct 9 at 0:38
  • 1
    Yes I can do it in any given specific instance but looking for a general programmatic solution. It's definitely computable, and am a bit surprised that I'm coming up empty handed looking for an implementation. Commented Oct 9 at 10:07
  • It looks like the answer is trying to parse the elements of the indexing tuple. The idea is valid, but I haven't studied it. numpy probably does much the same but in compiled code. But there's little need to return just the final shape. Commented Oct 9 at 20:49

1 Answer 1

2

maybe the following will be helpful:

import numpy as np
from numbers import Integral
from itertools import zip_longest

def _range_len(start, stop, step):
    #here it works like len(range(start, stop, step))
    if step == 0:
        raise ValueError("slice step cannot be zero")
    n = (stop - start)
    if (n > 0 and step > 0) or (n < 0 and step < 0):
        return 1 + (abs(n) - 1) // abs(step)
    return 0

def _broadcast_shape(shapes):
    if not shapes:
        return ()
    out = []
    rev = [s[::-1] for s in shapes]
    for dims in zip_longest(*rev, fillvalue=1):
        dim = 1
        for d in dims:
            if d == 1 or dim == 1:
                dim = max(dim, d)
            elif d != dim:
                raise IndexError(f"index arrays could not be broadcast together: {shapes}")
        out.append(dim)
    return tuple(reversed(out))

def _as_tuple_index(index):
    return index if isinstance(index, tuple) else (index,)

def _expand_ellipsis(index, ndim):
    idx = _as_tuple_index(index)
    if sum(i is Ellipsis for i in idx) > 1:
        raise IndexError("an index can only have a single ellipsis ('...')")
    #count how many array axes are explicitly consumed (None/newaxis does not)
    consumes = sum(i is not None and i is not Ellipsis for i in idx)
    to_fill = max(0, ndim - consumes)
    out = []
    for i in idx:
        if i is Ellipsis:
            out.extend([slice(None)] * to_fill)
        else:
            out.append(i)
    return tuple(out)

def shape_after_index(shape, index):
    """Return the shape you'd get from arr.shape==shape indexed by `index`."""
    ndim = len(shape)
    idx = _expand_ellipsis(index, ndim)

    #pad with trailing full slices to cover remaining axes
    consumes = sum(i is not None and i is not Ellipsis for i in idx)
    if consumes < ndim:
        idx = idx + (slice(None),) * (ndim - consumes)

    axis = 0
    basic_before = []
    basic_after  = []
    advanced_shapes = []
    seen_advanced = False

    for obj in idx:
        if obj is None:  #newaxis
            (basic_after if seen_advanced else basic_before).append(1)
            continue

        if isinstance(obj, slice):
            start, stop, step = obj.indices(shape[axis])
            length = _range_len(start, stop, step)
            (basic_after if seen_advanced else basic_before).append(length)
            axis += 1
            continue

        if isinstance(obj, (Integral, np.integer)):  # integer index -> drop axis
            axis += 1
            continue

        #advance indexing
        arr = np.asarray(obj)
        seen_advanced = True
        if arr.dtype == np.bool_:
            #boolean mask: result along that advanced block has size == number of Trues
            advanced_shapes.append((int(arr.sum()),))
        else:
            advanced_shapes.append(arr.shape)
        axis += 1

    bshape = _broadcast_shape(advanced_shapes)
    return tuple(basic_before) + bshape + tuple(basic_after)

some example usage:

shape_after_index((1000, 50), np.s_[:200, :])
# (200, 50)
shape_after_index((1000, 50), np.s_[None, 1:2:3, ..., 0])
# (1, 1)
Sign up to request clarification or add additional context in comments.

2 Comments

could you add some more details about how this works?
I suspect dtype boolean arrays with ndim > 1 aren't being handled correctly, I also suspect the exact rules around combined advanced+basic indexing may not be completely correct. In general, the explanation is all here: numpy.org/doc/stable/user/basics.indexing.html#

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.