1

I have an np.ndarray that looks like this:

print(x)
[[1 3 None None None None]
 [0 2 3 4 None None]
 [1 5 4 None None None]
 [1 6 0 4 None None]
 [7 6 5 1 3 2]
 [4 7 2 8 None None]
 [7 4 3 None None None]
 [4 6 8 5 None None]
 [7 5 None None None None]]

I am supplying it to a cython function defined as follows:

cpdef y(int[:,::1] x):
...

This throws up the error: ValueError: Buffer dtype mismatch, expected 'int' but got Python object

This is probably happening because of the presence of Nones in the array, since modifying them to 0s removes the error. But the presence of None should not be posing a problem, as written here: Cython Extension Types

So, what is going on? Is there a quick solution to this?

3
  • 3
    Numpy array such as a = np.array([1, None]), a.dtype is object, that's the problem, int[:,::1] expect an int buffer, but got an object buffer Commented Apr 21, 2019 at 1:43
  • 2
    That documentation reads as though the entire variable can be None, whereas you have an array that is made up of int and non-int/None values, which is not valid. Commented Apr 21, 2019 at 1:44
  • So, is there an easy correction for this? Can I typecast the Nones to be integers? Or do I just have to convert the Nones to some int value? Commented Apr 21, 2019 at 15:52

2 Answers 2

1

The dtype of numpy array such as np.array([1, None]) is object. int[:,::1] expect a buffer of int, but get a buffer of object, that's the error says.

How to correct this should depend on the context, specifically, what does None mean?

  1. You can set the Nones to 0, then convert the array to int array
a = np.array([[1, None]])
a[a==None] = 0
a = a.astype(np.int)
f(a) # then deal with 0
  1. Or you can change the cython function signature to f(double[:, ::1])
a = np.array([[1, None]])
a = a.astype(np.float)
# a will be np.array([1.0, nan]),
# then deal with nan...
f(a)
  1. Or you may change the cython function signature to f(object[:, ::1]) (This may not be your intention)

So, it depends on the context.

Sign up to request clarification or add additional context in comments.

Comments

0

It's possible that Numpys ma module (for Masked Array) does what you want:

x = np.ma.array([[1, 3, 0, 0, 0, 0],
                 [0, 2, 3, 4, 0, 0]],
                dtype=np.int,
                mask=[[0, 0, 1, 1, 1, 1],
                      [0, 0, 0, 0, 1, 1]]) # True is "masked out"

In Cython you'd split it into the data and the mask

def y(x):
   cdef int[:,::1] x_data = x.data
   cdef int8_t[:,::1] x_mask = x.mask.view(dtype=np.int8)

I've viewed it as an int8 since Cython doesn't deal well with dtype=np.bool.


You could also think about creating your own data structures - for example, it looks like it's always the end of the row that is None, so you could create an 2D array of ints, and a 1D array row lengths (a 1D array of ints). You'd then ignore anything beyond the row length.


It's probably worth emphasising why you can't store None in an int array - in order to get the speed and space efficiencies of using an int array then Numpy only allocates the space needed to store the numbers. To store None would involve allocating a little bit of extra space for every number to say "actually this one is a different type", and for every operation to have a check before it for "is this number actually a number?". As you can imagine that rapidly becomes inefficient.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.