I have done this successfully for large data mining tasks in radio astronomy. See https://github.com/ewanbarr/sigpyproc.git for an example.
The thing to note is that the C libraries I built in these cases are accessed via ctypes and not as native Python extensions.
So, for example:
Python: test.py
import ctypes as C
import numpy as np
from numpy.ctypeslib import as_ctypes
lib = C.CDLL("libmytest.so")
def set_N_threads(nthreads):
self.lib.omp_set_num_threads(nthreads)
def do_some_task(input_array):
input_array = input_array.astype("float32")
output_array = np.empty_like(input_array)
lib.do_some_omp_task(as_ctypes(input_array),
as_ctypes(output_array),
C.c_size_t(input_array.size))
return output_array
C: test.c
#include <omp.h>
void do_some_omp_task(float* input_array,
float* output_array,
size_t size)
{
int ii;
#pragma omp parallel for default(shared) private(ii)
for (ii=0;ii<size;ii++)
do something using ii and the arrays
}
Compile:
gcc -O3 -lm -fopenmp -fPIC -c test.c -o test.o
gcc -shared -lgomp -o libmytest.so test.o
To answer your question, I have had no problems with this kind of setup and the speed improvements achievable have been impressive (although the above example wouldn't really benefit from OMP)