Declaring a numpy array with cython strangely generates a lot of overhead

Question

I'm rewriting some of my Python code using Cython.

Following the suggestions in the documentation I started substituting my python arrays with the optimized cython definition.

In particular, the following is supposed to be the 'best' way of declaring a numpy array:

# cython: profile=True
# cython: boundscheck=False
# cython: wraparound=False

import numpy as np
cimport numpy as np

cpdef test():

    cdef np.ndarray[np.int_t, ndim=1] seeds_idx = np.empty(10, dtype=np.int)

    pass

However, the html file generated by profiling the code above via cython -a my_file.pyx shows the following:

+10:     cdef np.ndarray[np.int_t, ndim=1] seeds_idx = np.empty(10, dtype=np.int)
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 10, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_empty); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 10, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 10, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_3 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 10, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __pyx_t_4 = __Pyx_PyObject_GetAttrStr(__pyx_t_3, __pyx_n_s_int); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 10, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  if (PyDict_SetItem(__pyx_t_1, __pyx_n_s_dtype, __pyx_t_4) < 0) __PYX_ERR(0, 10, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
  __pyx_t_4 = __Pyx_PyObject_Call(__pyx_t_2, __pyx_tuple_, __pyx_t_1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 10, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  if (!(likely(((__pyx_t_4) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_4, __pyx_ptype_5numpy_ndarray))))) __PYX_ERR(0, 10, __pyx_L1_error)
  __pyx_t_5 = ((PyArrayObject *)__pyx_t_4);
  {
    __Pyx_BufFmt_StackElem __pyx_stack[1];
    if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_seeds_idx.rcbuffer->pybuffer, (PyObject*)__pyx_t_5, &__Pyx_TypeInfo_nn___pyx_t_5numpy_int_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {
      __pyx_v_seeds_idx = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_seeds_idx.rcbuffer->pybuffer.buf = NULL;
      __PYX_ERR(0, 10, __pyx_L1_error)
    } else {__pyx_pybuffernd_seeds_idx.diminfo[0].strides = __pyx_pybuffernd_seeds_idx.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_seeds_idx.diminfo[0].shape = __pyx_pybuffernd_seeds_idx.rcbuffer->pybuffer.shape[0];
    }
  }
  __pyx_t_5 = 0;
  __pyx_v_seeds_idx = ((PyArrayObject *)__pyx_t_4);
  __pyx_t_4 = 0;
/* … */
  __pyx_tuple_ = PyTuple_Pack(1, __pyx_int_10); if (unlikely(!__pyx_tuple_)) __PYX_ERR(0, 10, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple_);
  __Pyx_GIVEREF(__pyx_tuple_);

This was obtained on Python 2.7 with cython 0.24 and numpy 1.10.4.

On the other hand, the very simple declaration seeds_idx = np.empty(10) results in:

+10:     seeds_idx = np.empty(10)
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 10, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_empty); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 10, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_2, __pyx_tuple_, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 10, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_v_seeds_idx = __pyx_t_1;
  __pyx_t_1 = 0;
/* … */
  __pyx_tuple_ = PyTuple_Pack(1, __pyx_int_10); if (unlikely(!__pyx_tuple_)) __PYX_ERR(0, 10, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple_);
  __Pyx_GIVEREF(__pyx_tuple_);

What is going wrong here (if any)? Thanks!

There is nothing wrong, numpy arrays are complex (yet very efficient) data structures. You can try using typed memoryviews instead anyway, they are usually faster and can be easily converted to numpy arrays. — Imanol Luengo
– Imanol Luengo, Commented May 24, 2016 at 8:45
The other point worth making is that there is overhead in assigning the array. Using the array of quick but assigning it can be a little slower so try not to do it unnecessarily. — DavidW
– DavidW, Commented May 24, 2016 at 10:28
I see, so there is a small overhead during declaration but a much faster assigning/accessing/etc. — Gioker
– Gioker, Commented May 25, 2016 at 8:32

Community · Accepted Answer · 2017-05-23 12:31:45Z

3

As the comment stated, there is nothing wrong going on here so no need to worry. Also, remember that you are checking the code generated for a simple assignment, any differences won't affect performance.

A small errata though, in the second case seeds_idx = np.empty(10) should be changed to seeds_idx = np.empty(10, dtype=np.int) to match the first.

If you add that, then the dictionary that is created for storing the arguments of the function call (np.empty) is also added:

__pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 8, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);

the lookup for np.int:

__pyx_t_3 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 10, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__pyx_t_4 = __Pyx_PyObject_GetAttrStr(__pyx_t_3, __pyx_n_s_int); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 10, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_4);
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;

and the setting of the arguments in the newly created dictionary is done:

if (PyDict_SetItem(__pyx_t_1, __pyx_n_s_dtype, __pyx_t_4) < 0) __PYX_ERR(0, 8, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;

Other than these, the only difference between them is the following:

if (!(likely(((__pyx_t_4) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_4, __pyx_ptype_5numpy_ndarray))))) __PYX_ERR(0, 10, __pyx_L1_error)
  __pyx_t_5 = ((PyArrayObject *)__pyx_t_4);
  {
    __Pyx_BufFmt_StackElem __pyx_stack[1];
    if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_seeds_idx.rcbuffer->pybuffer, (PyObject*)__pyx_t_5, &__Pyx_TypeInfo_nn___pyx_t_5numpy_int_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {
      __pyx_v_seeds_idx = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_seeds_idx.rcbuffer->pybuffer.buf = NULL;
      __PYX_ERR(0, 10, __pyx_L1_error)
    } else {__pyx_pybuffernd_seeds_idx.diminfo[0].strides = __pyx_pybuffernd_seeds_idx.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_seeds_idx.diminfo[0].shape = __pyx_pybuffernd_seeds_idx.rcbuffer->pybuffer.shape[0];
    }
  }

Which, as stated in the documentation you linked is most likely performed in order to have fast access to the data buffer.

The best alternative, by far, is using typed memoryviews. These are the native way and most likely the easiest way to work with arrays in cython. Their performance is usually on par with numpy arrays and if not you can always switch between them easily.

edited May 23, 2017 at 12:31

CommunityBot

11 silver badge

answered May 24, 2016 at 10:06

Dimitris Fasarakis Hilliard

162k35 gold badges282 silver badges265 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Gioker Over a year ago

Then my question is, what is the advantage on using a numpy array this way over declaring a dictionary directly?

Dimitris Fasarakis Hilliard Over a year ago

I was a bit vague on the dictionary creation; a dictionary is created whenever you want to pass keyword args to a function, so in the case of the np.empty(10, dtype=np.int) a dictionary is created to store the arguments. In the case of np.empty(10) a dictionary isn't created. It is always advisable to declare the type of the array so Cython can optimize the generated c code based on it. I updated my answer to include some links on typed memoryviews which are the advisable way to do things in Cython.

Gioker Over a year ago

Oh ok, just for the keywords...still a lot of overhead I suppose. On the matter of the typed memoryview I posted a new question here. Thanks Jim!

Collectives™ on Stack Overflow

Declaring a numpy array with cython strangely generates a lot of overhead

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related