4

I have a problem. Problem is: I want to create a subclass of numpy array, and then make an array of objects of that type. When I reference an item in that array, I want it still to be an instance of that subclass. Instead, it is an instance of numpy array.

Here is a test that fails:

import numpy as np


class ImageWrapper(np.ndarray):

    def __new__(cls, image_data):
        assert image_data.ndim in (2, 3)
        return image_data.view(cls)

    @property
    def n_colours(self): 
        return 1 if self.ndim==2 else self.shape[2]


n_frames = 10
frames = [ImageWrapper(np.random.randint(255, size = (20, 15, 3)).astype('uint8')) for _ in xrange(n_frames)]
video = np.array(frames)
assert video[0].n_colours == 3

Gives me: AttributeError: 'numpy.ndarray' object has no attribute 'n_colours'

How can I make this work?

Things tried already:

  • Setting subok=True when constructing the video - this only works when constructing an array from a single instance of the subclass object, not a list.
  • Setting dtype=object or dtype=ImageWrapper doesn't work

I recognize that I could just make video a list, but it would be preferable to keep it as a numpy array for other reasons.

4
  • The problem is that when you call array on a list of 3D arrays, you get a 4D array, not a 1D array full of 3D arrays. Obviously that 4D array can't be an ImageWrapper, so it's an ndarray, so any slice of it is an ndarray as well, no matter where the data originally came from. The question is, why do you want it to be an array? A 1D array of object doesn't lose all of the benefits of numpy over native lists, but it loses a lot of them, and if you can tell us the "other reasons" in your last sentence, it might help. Commented Jul 22, 2014 at 23:21
  • Also, is there a reason your design has to mandate that a 4D array can't be an ImageWrapper? Your n_colours would have to return an array of N-3 dimensions instead of a scalar if N>3 (or just raise an exception), but otherwise, what would be the problem? Because that would make things a lot simpler… Commented Jul 22, 2014 at 23:23
  • "other reasons" is not a great reason, it's just that this is part of an interface where arrays are the expected data type. In this case, array of objects, as suggested by Jaime, will do. Commented Jul 23, 2014 at 18:09
  • But would a 4D array subclass not also work? It seems to me that if you have code that wants an array, it's going to want to broadcast n_colours and crop and downsample and so on over that array. A 4D array gives you all of that for free (at least on the caller side; you have to be careful on the implementation side, as Bi Rico shows); with an array of objects that are arrays, you have to manually wrap the unbound method in a ufunc to do anything. (Unless you're planning to just iterate the objects, in which case, why use an array?) Commented Jul 23, 2014 at 18:41

2 Answers 2

3

Whatever it is you are trying to achieve, there is probably a better way of doing it than subclassing ndarray. But given that, you could have your array be of type object, although you have to be careful when creating it. This works:

>>> video = np.empty((len(frames),), dtype=object)
>>> video[:] = frames
>>> video[0].n_colours
3

But this doesn't:

>>> video = np.array(frames, dtype=object)
>>> video[0].n_colours
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'n_colours'
Sign up to request clarification or add additional context in comments.

2 Comments

That works for me, thanks. The real reason for this is we have a bizarre image format (YUV), and we want to have methods bound to it to do things like cropping and downsampling. The reason for subclassing rather than wrapping is more questionable - mainly it's because we're working in python but want to pretend we're in Java, and our interface demands that the data passed between modules is of array type.
@Peter: YUV isn't all that bizarre or uncommon. Also, it seems like it would be better for you if ImageWrapper could hold an array of >3D, in which case calling methods on it would effectively broadcast the call over each 3D subarray, rather than making you iterate over each 3D subarray manually. (That's what I was suggesting in my comment, and I'm pretty sure what Bi Rico was thinking in his answer.) Conversely, if you really want to force yourself to iterate the arrays, you're actually better off with a list of them than an array.
2

numpy.array just isn't sophisticated enough to handle this case. subok=True tells the function to pass though sub classes, but you're not passing it a subclass of ndarray, you're passing it a list (which happens to be populated with instances of an ndarray subclass). You can get something like what you expect by doing this:

import numpy as np


class ImageWrapper(np.ndarray):

    def __new__(cls, image_data):
        assert 2 <= image_data.ndim <= 4
        return image_data.view(cls)

    @property
    def n_colours(self): 
        return 1 if self.ndim==2 else self.shape[-1]


n_frames = 10
frame_shape = (20, 15, 3)
video = ImageWrapper(np.empty((n_frames,) + frame_shape, dtype='uint8'))
for i in xrange(n_frames):
    video[i] = np.random.randint(255, size=(20, 15, 3))
assert video[0].n_colours == 3

Notice I had to update ImageWrapper to allow 4d arrays as input.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.