6

Thank you all in advance.

I am wondering what's the right way to #include all numpy headers and what's the right way to use Cython and C++ to parse numpy arrays. Below is attempt:

// cpp_parser.h 
#ifndef _FUNC_H_
#define _FUNC_H_

#include <Python.h>
#include <numpy/arrayobject.h>

void parse_ndarray(PyObject *);

#endif

I know this might be wrong, I also tried other options but none of them works.

// cpp_parser.cpp
#include "cpp_parser.h"
#include <iostream>

using namespace std;

void parse_ndarray(PyObject *obj) {
    if (PyArray_Check(obj)) { // this throws seg fault
        cout << "PyArray_Check Passed" << endl;
    } else {
        cout << "PyArray_Check Failed" << endl;
    }
}

The PyArray_Check routine throws Segmentation Fault. PyArray_CheckExact doesn't throw, but it is not what I wanted exactly.

# parser.pxd
cdef extern from "cpp_parser.h": 
    cdef void parse_ndarray(object)

and the implementation file is:

# parser.pyx
import numpy as np
cimport numpy as np

def py_parse_array(object x):
    assert isinstance(x, np.ndarray)
    parse_ndarray(x)

The setup.py script is

# setup.py
from distutils.core import setup, Extension
from Cython.Build import cythonize

import numpy as np

ext = Extension(
    name='parser',
    sources=['parser.pyx', 'cpp_parser.cpp'],
    language='c++',
    include_dirs=[np.get_include()],
    extra_compile_args=['-fPIC'],
)

setup(
    name='parser',
    ext_modules=cythonize([ext])
    )

And finally the test script:

# run_test.py
import numpy as np
from parser import py_parse_array

x = np.arange(10)
py_parse_array(x)

I have created a git repo with all the scripts above: https://github.com/giantwhale/study_cython_numpy/

2
  • Don&amp;#39;t python functions need to be pure C? I mean, did you tried the extern "C" magic before function declaration and definition? Commented Oct 31, 2017 at 2:27
  • @geckos I doubt it is the reason. I would expect Cython would automatically handles this when provided with language='C++'. I am saying this because I also wrote a memoryview version and it works. Commented Oct 31, 2017 at 2:32

2 Answers 2

13

Quick Fix (read on for more details and a more sophisticated approach):

You need to initialize the variable PyArray_API in every cpp-file in which you are using numpy-stuff by calling import_array():

//it is only a trick to ensure import_array() is called, when *.so is loaded
//just called only once
int init_numpy(){
     import_array(); // PyError if not successful
     return 0;
}

const static int numpy_initialized =  init_numpy();

void parse_ndarraray(PyObject *obj) { // would be called every time
    if (PyArray_Check(obj)) {
        cout << "PyArray_Check Passed" << endl;
    } else {
        cout << "PyArray_Check Failed" << endl;
    }
}

One could also use _import_array, which returns a negative number if not successful, to use a custom error handling. See here for definition of import_array.

Warning: As pointed out by @isra60, _import_array()/import_array() can only be called, once Python is initialized, i.e. after Py_Initialize() was called. This is always the case for an extension, but not always the case if the python interpreter is embedded, because numpy_initialized is initialized before main-starts. In this case, "the initialization trick" should not be used but init_numpy() called after Py_Initialize().


Sophisticated solution:

NB: For information, why setting PyArray_API is needed, see this SO-answer: in order to be able to postpone resolution of symbols until running time, so numpy's shared object aren't needed at link time and must not be on dynamic-library-path (python's system path is enough then).

The proposed solution is quick, but if there are more than one cpp using numpy, one have a lot of instances of PyArray_API initialized.

This can be avoided if PyArray_API isn't defined as static but as extern in all but one translation unit. For those translation units NO_IMPORT_ARRAY macro must be defined before numpy/arrayobject.h is included.

We need however a translation unit in which this symbol is defined. For this translation unit the macro NO_IMPORT_ARRAY must not be defined.

However, without defining the macro PY_ARRAY_UNIQUE_SYMBOL we will get only a static symbol, i.e. not visible for other translations unit, thus the linker will fail. The reason for that: if there are two libraries and everyone defines a PyArray_API then we would have a multiple definition of a symbol and the linker will fail, i.e. we cannot use these both libraries together.

Thus, by defining PY_ARRAY_UNIQUE_SYMBOL as MY_FANCY_LIB_PyArray_API prior to every include of numpy/arrayobject.h we would have our own PyArray_API-name, which would not clash with other libraries.

Putting it all together:

A: use_numpy.h - your header for including numpy-functionality i.e. numpy/arrayobject.h

//use_numpy.h

//your fancy name for the dedicated PyArray_API-symbol
#define PY_ARRAY_UNIQUE_SYMBOL MY_PyArray_API 

//this macro must be defined for the translation unit              
#ifndef INIT_NUMPY_ARRAY_CPP 
    #define NO_IMPORT_ARRAY //for usual translation units
#endif

//now, everything is setup, just include the numpy-arrays:
#include <numpy/arrayobject.h>

B: init_numpy_api.cpp - a translation unit for initializing of the global MY_PyArray_API:

//init_numpy_api.cpp

//first make clear, here we initialize the MY_PyArray_API
#define INIT_NUMPY_ARRAY_CPP

//now include the arrayobject.h, which defines
//void **MyPyArray_API
#inlcude "use_numpy.h"

//now the old trick with initialization:
int init_numpy(){
     import_array();// PyError if not successful
     return 0;
}
const static int numpy_initialized =  init_numpy();

C: just include use_numpy.h whenever you need numpy, it will define extern void **MyPyArray_API:

//example
#include "use_numpy.h"

...
PyArray_Check(obj); // works, no segmentation error

Warning: It should not be forgotten, that for initialization-trick to work, Py_Initialize() must be already called.


Why do you need it (kept for historical reasons):

When I build your extension with debug symbols:

extra_compile_args=['-fPIC', '-O0', '-g'],
extra_link_args=['-O0', '-g'],

and run it with gdb:

 gdb --args python run_test.py
 (gdb) run
  --- Segmentation fault
 (gdb) disass

I can see the following:

   0x00007ffff1d2a6d9 <+20>:    mov    0x203260(%rip),%rax       
       # 0x7ffff1f2d940 <_ZL11PyArray_API>
   0x00007ffff1d2a6e0 <+27>:    add    $0x10,%rax
=> 0x00007ffff1d2a6e4 <+31>:    mov    (%rax),%rax
   ...
   (gdb) print $rax
   $1 = 16

We should keep in mind, that PyArray_Check is only a define for:

#define PyArray_Check(op) PyObject_TypeCheck(op, &PyArray_Type)

That seems, that &PyArray_Type uses somehow a part of PyArray_API which is not initialized (has value 0).

Let's take a look at the cpp_parser.cpp after the preprocessor (compiled with flag -E:

 static void **PyArray_API= __null
 ...
 static int
_import_array(void)
{
  PyArray_API = (void **)PyCapsule_GetPointer(c_api,...

So PyArray_API is static and is initialized via _import_array(void), that actually would explain the warning I get during the build, that _import_array() was defined but not used - we didn't initialize PyArray_API.

Because PyArray_API is a static variable it must be initialized in every compilation unit i.e. cpp - file.

So we just need to do it - import_array() seems to be the official way.

Sign up to request clarification or add additional context in comments.

1 Comment

I also need to add Py_Initialize(); before _import_array();
2

Since you use Cython, the numpy APIs have been included in the Cython Includes already. It's straight forward in jupyter notebook.

cimport numpy as np
from numpy cimport PyArray_Check

np.import_array()  # Attention!

def parse_ndarray(object ndarr):
    if PyArray_Check(ndarr):
        print("PyArray_Check Passed")
    else:
        print("PyArray_Check Failed")

I believe np.import_array() is a key here, since you call into the numpy APIs. Comment it and try, a crash also appears.

import numpy as np
from array import array
ndarr = np.arange(3)
pyarr = array('i', range(3))
parse_ndarray(ndarr)
parse_ndarray(pyarr)
parse_ndarray("Trick or treat!")

Output:

PyArray_Check Passed
PyArray_Check Failed
PyArray_Check Failed

1 Comment

thank you so much for the reply. It is good to know we can actually do this in Cython. However, I am really looking for ways to work with pure C++ because I have some critical piece that has to be implemented in C++. I have added some comment at the end of my original post.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.