5

For a numpy array X, the location of its element X[k[0], ..., k[d-1]] is offset from the location of X[0,..., 0] by k[0]*s[0] + ... + k[d-1]*s[d-1], where (s[0],...,s[d-1]) is the tuple representing X.strides.

As far as I understand nothing in numpy array specs requires that distinct indexes of array X correspond to distinct addresses in memory, the simplest instance of this being a zero value of the stride, e.g. see advanced NumPy section of scipy lectures.

Does the numpy have a built-in predicate to test if the strides and the shape are such that distinct indexes map to distinct memory addresses?

If not, how does one write one, preferably so as to avoid sorting of the strides?

1
  • That advanced numpy is an interesting link. I haven't seen it before, though I seem to have stumbled across most of those ideas in one way or other. I just used as_strided to fetch a diagonal in other SO question. Commented Sep 16, 2016 at 22:39

2 Answers 2

5

edit: It took me a bit to figure what you are asking about. With striding tricks it's possible to index the same element in a databuffer in different ways, and broadcasting actually does this under the covers. Normally we don't worry about it because it is either hidden or intentional.

Recreating in the strided mapping and looking for duplicates may be the only way to test this. I'm not aware of any existing function that checks it.

==================

I'm not quite sure what you concerned with. But let me illustrate how shape and strides work

Define a 3x4 array:

In [453]: X=np.arange(12).reshape(3,4)
In [454]: X.shape
Out[454]: (3, 4)
In [455]: X.strides
Out[455]: (16, 4)

Index an item

In [456]: X[1,2]
Out[456]: 6

I can get it's index in a flattened version of the array (e.g. the original arange) with ravel_multi_index:

In [457]: np.ravel_multi_index((1,2),X.shape)
Out[457]: 6

I can also get this location using strides - keeping mind that strides are in bytes (here 4 bytes per item)

In [458]: 1*16+2*4
Out[458]: 24
In [459]: (1*16+2*4)/4
Out[459]: 6.0

All these numbers are relative to the start of the data buffer. We can get the data buffer address from X.data or X.__array_interface__['data'], but usually don't need to.

So this strides tells us that to go from entry to the next, step 4 bytes, and to go from one row to the next step 16. 6 is located at one row down, 2 over, or 24 bytes into the buffer.

In the as_strided example of your link, strides=(1*2, 0) produces repeated indexing of specific values.

With my X:

In [460]: y=np.lib.stride_tricks.as_strided(X,strides=(16,0), shape=(3,4))
In [461]: y
Out[461]: 
array([[0, 0, 0, 0],
       [4, 4, 4, 4],
       [8, 8, 8, 8]])

y is a 3x4 that repeatedly indexes the 1st column of X.

Changing one item in y ends up changing one value in X but a whole row in y:

In [462]: y[1,2]=10
In [463]: y
Out[463]: 
array([[ 0,  0,  0,  0],
       [10, 10, 10, 10],
       [ 8,  8,  8,  8]])
In [464]: X
Out[464]: 
array([[ 0,  1,  2,  3],
       [10,  5,  6,  7],
       [ 8,  9, 10, 11]])

as_strided can produce some weird effects if you aren't careful.

OK, maybe I've figured out what's bothering you - can I identify a situation like this where two different indexing tuples end up pointing to the same location in the data buffer? Not that I'm aware of. That y strides contains a 0 is a pretty good indicator.

as_stridedis often used to create overlapping windows:

In [465]: y=np.lib.stride_tricks.as_strided(X,strides=(8,4), shape=(3,4))
In [466]: y
Out[466]: 
array([[ 0,  1,  2,  3],
       [ 2,  3, 10,  5],
       [10,  5,  6,  7]])
In [467]: y[1,2]=20
In [469]: y
Out[469]: 
array([[ 0,  1,  2,  3],
       [ 2,  3, 20,  5],
       [20,  5,  6,  7]])

Again changing 1 item in y ends up changing 2 values in y, but only 1 in X.

Ordinary array creation and indexing does not have this duplicate indexing issue. Broadcasting may do something like, under the cover, where a (4,) array is changed to (1,4) and then to (3,4), effectively replicating rows. I think there's another stride_tricks function that does this explicitly.

In [475]: x,y=np.lib.stride_tricks.broadcast_arrays(X,np.array([.1,.2,.3,.4]))
In [476]: x
Out[476]: 
array([[ 0,  1,  2,  3],
       [20,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [477]: y
Out[477]: 
array([[ 0.1,  0.2,  0.3,  0.4],
       [ 0.1,  0.2,  0.3,  0.4],
       [ 0.1,  0.2,  0.3,  0.4]])
In [478]: y.strides
Out[478]: (0, 8)

In any case, in normal array use we don't have to worry about this ambiguity. We get it only with intentional actions, not accidental ones.

==============

How about this for a test:

def dupstrides(x):
    uniq={sum(s*j for s,j in zip(x.strides,i)) for i in np.ndindex(x.shape)}
    print(uniq)
    print(len(uniq))
    print(x.size)
    return len(uniq)<x.size

In [508]: dupstrides(X)
{0, 32, 4, 36, 8, 40, 12, 44, 16, 20, 24, 28}
12
12
Out[508]: False
In [509]: dupstrides(y)
{0, 4, 8, 12, 16, 20, 24, 28}
8
12
Out[509]: True
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the explanation, and the answer. The motivation for asking this question comes from trying to figure out when it is safe to perform an operation on an array in place.
4

It turns out this test is already implemented in numpy, see mem_overlap.c:842.

The test is exposed as numpy.core.multiarray_tests.internal_overlap(x).

Example:

>>> import numpy as np
>>> from numpy.core.multiarray_tests import internal_overlap
>>> from numpy.lib.stride_tricks import as_strided

Now, create a contiguous array, and use as_strided to create an array with internal overlapping, and confirm this with the testing:

>>> x = np.arange(3*4, dtype=np.float64).reshape((3,4))
>>> y = as_strided(x, shape=(5,4), strides=(16, 8))
>>> y
array([[  0.,   1.,   2.,   3.],
       [  2.,   3.,   4.,   5.],
       [  4.,   5.,   6.,   7.],
       [  6.,   7.,   8.,   9.],
       [  8.,   9.,  10.,  11.]])
>>> internal_overlap(x)
False
>>> internal_overlap(y)
True

The function is optimized to quickly returns False for Fortran- or C- contiguous arrays.

2 Comments

What happened to this function in later versions of numpy?
The function still exists internally. See github.com/numpy/numpy/blob/…. Not sure whether it is still accessible via the public API in any way.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.