Is there a way to check if NumPy arrays share the same data?

Question

My impression is that in NumPy, two arrays can share the same memory. Take the following example:

import numpy as np
a=np.arange(27)
b=a.reshape((3,3,3))
a[0]=5000
print (b[0,0,0]) #5000

#Some tests:
a.data is b.data #False
a.data == b.data #True

c=np.arange(27)
c[0]=5000
a.data == c.data #True ( Same data, not same memory storage ), False positive

So clearly b didn't make a copy of a; it just created some new meta-data and attached it to the same memory buffer that a is using. Is there a way to check if two arrays reference the same memory buffer?

My first impression was to use a.data is b.data, but that returns false. I can do a.data == b.data which returns True, but I don't think that checks to make sure a and b share the same memory buffer, only that the block of memory referenced by a and the one referenced by b have the same bytes.

Here is the most relevant previously asked question: stackoverflow.com/questions/10747748/… — Robert Kern
– Robert Kern, Commented Jul 2, 2012 at 9:27
@RobertKern -- Thanks. I had actually seen that post, but since I couldn't find documentation for numpy.may_share_memory (other than the built-in help), I thought there might be something else -- e.g. numpy.uses_same_memory_exactly. (my use case is slightly less general than the other one, so I thought there might be a more definitive answer). Anyway, having seen your name on a few numpy mailing lists, I'm guessing that the answer is "there is no such function". — mgilson
– mgilson, Commented Jul 2, 2012 at 13:29
numpy.may_share_memory() does not show up in the reference manual only due to an accident of the organization of the reference manual. It's the right thing to use. Unfortunately, there is no uses_same_memory_exactly() function at the moment. To implement such a function requires solving a bounded linear Diophantine equation, an NP-hard problem. The problem size is usually not too large, but just writing down the algorithm is annoying, so it hasn't been done yet. If we do, it will be incorporated into numpy.may_share_memory(), so that's what I recommend using. — Robert Kern
– Robert Kern, Commented Jul 2, 2012 at 14:30
@RobertKern -- Thanks for the input. I'll be sure to use np.may_share_memory(). I use this mostly for debugging/optimization to make sure that I don't gratuitously allocate arrays by accident. Thanks again. — mgilson
– mgilson, Commented Jul 2, 2012 at 14:35

Brionius · Accepted Answer · 2022-02-06 20:00:54Z

40

You can use the base attribute to check if an array shares the memory with another array:

>>> import numpy as np
>>> a = np.arange(27)
>>> b = a.reshape((3,3,3))
>>> b.base is a
True
>>> a.base is b
False

Not sure if that solves your problem. The base attribute will be None if the array owns its own memory. Note that an array's base will be another array, even if it is a subset:

>>> c = a[2:]
>>> c.base is a
True

edited Feb 6, 2022 at 20:00

Brionius

14.2k3 gold badges41 silver badges50 bronze badges

answered Jul 2, 2012 at 1:30

jterrace

67.5k24 gold badges164 silver badges208 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

mgilson Over a year ago

This is probably good enough for my purposes. It's unfortunate that it isn't a 2-way street though. I'll wait and see if anything else better pops up. In the meantime, thanks. (+1)

user545424 Over a year ago

You could do a.base is b or b.base is a.

Robert Kern Over a year ago

This is unreliable. Each array may have chains of .base attributes, e.g. a.base.base is b may be true. Arrays can also be constructed to point at the same memory without sharing the same .base objects.

mgilson Over a year ago

@user545424 -- The best I could do with this is a.base is b or b.base is a or a.base is b.base, but that seems clunky at best.

Wang Over a year ago

@jterrace never never trust the base test. try this: m=matrix(b), then m.base is a will raise False, however, m.base.base is a will raise True. So one should always relies on may_share_memory

|

b-butler · Accepted Answer · 2020-05-14 14:36:15Z

13

To solve the problem exactly, you can use

import numpy as np

a=np.arange(27)
b=a.reshape((3,3,3))

# Checks exactly by default
np.shares_memory(a, b)

# Checks bounds only
np.may_share_memory(a, b)

Both np.may_share_memory and np.shares_memory take an optional max_work argument that lets you decide how much effort to put in to ensure no false positives. This problem is NP-complete, so always finding the correct answer can be quite computationally expensive.

answered May 14, 2020 at 14:36

b-butler

1311 silver badge2 bronze badges

Comments

user545424 · Accepted Answer · 2012-07-04 00:30:55Z

10

I think jterrace's answer is probably the best way to go, but here is another possibility.

def byte_offset(a):
    """Returns a 1-d array of the byte offset of every element in `a`.
    Note that these will not in general be in order."""
    stride_offset = np.ix_(*map(range,a.shape))
    element_offset = sum(i*s for i, s in zip(stride_offset,a.strides))
    element_offset = np.asarray(element_offset).ravel()
    return np.concatenate([element_offset + x for x in range(a.itemsize)])

def share_memory(a, b):
    """Returns the number of shared bytes between arrays `a` and `b`."""
    a_low, a_high = np.byte_bounds(a)
    b_low, b_high = np.byte_bounds(b)

    beg, end = max(a_low,b_low), min(a_high,b_high)

    if end - beg > 0:
        # memory overlaps
        amem = a_low + byte_offset(a)
        bmem = b_low + byte_offset(b)

        return np.intersect1d(amem,bmem).size
    else:
        return 0

Example:

>>> a = np.arange(10)
>>> b = a.reshape((5,2))
>>> c = a[::2]
>>> d = a[1::2]
>>> e = a[0:1]
>>> f = a[0:1]
>>> f = f.reshape(())
>>> share_memory(a,b)
80
>>> share_memory(a,c)
40
>>> share_memory(a,d)
40
>>> share_memory(c,d)
0
>>> share_memory(a,e)
8
>>> share_memory(a,f)
8

Here is a plot showing the time for each share_memory(a,a[::2]) call as a function of the number of elements in a on my computer.

share_memory function

edited Jul 4, 2012 at 0:30

answered Jul 2, 2012 at 3:16

user545424

16.3k11 gold badges61 silver badges72 bronze badges

3 Comments

Robert Kern Over a year ago

One can have views that share memory even with different itemsizes. For example, I may get an array as a float32 with interleaved real and imaginary components and view it as a complex64 array. A more reliable implementation is in numpy.may_share_memory().

user545424 Over a year ago

@RobertKern: Good point. I've updated my answer. Do you see any potential issues with this solution?

user545424 Over a year ago

I think I've finally got it right. share_memory() requires memory on the order of sum of the sizes of each array, but it's pretty quick.

Nir Friedman · Accepted Answer · 2015-03-10 14:58:39Z

7

Just do:

a = np.arange(27)
a.__array_interface__['data']

The second line will return a tuple where the first entry is the memory address and the second is whether the array is read only. Combined with the shape and data type, you can figure out the exact span of memory address that the array covers, so you can also work out from this when one array is a subset of another.

answered Mar 10, 2015 at 14:58

Nir Friedman

17.9k2 gold badges48 silver badges77 bronze badges

Collectives™ on Stack Overflow

Is there a way to check if NumPy arrays share the same data?

4 Answers 4

6 Comments

Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related