4

So, I am considering to make a C extension of a python component that I've got. Then, I thought of using OMP to take as much advantage as I can from the machines that will, eventually, run the combined solution of Python+C.

Has anyone tried something similar at all? Are there any particular, unpropitious, details than could make such a solution fail?

Thanks in advance!

2 Answers 2

3

I have done this successfully for large data mining tasks in radio astronomy. See https://github.com/ewanbarr/sigpyproc.git for an example.

The thing to note is that the C libraries I built in these cases are accessed via ctypes and not as native Python extensions.

So, for example:

Python: test.py

import ctypes as C
import numpy as np
from numpy.ctypeslib import as_ctypes
lib = C.CDLL("libmytest.so")

def set_N_threads(nthreads):
    self.lib.omp_set_num_threads(nthreads)

def do_some_task(input_array):
    input_array = input_array.astype("float32")
    output_array = np.empty_like(input_array)
    lib.do_some_omp_task(as_ctypes(input_array),
                         as_ctypes(output_array),
                         C.c_size_t(input_array.size))
    return output_array

C: test.c

#include <omp.h>

void do_some_omp_task(float* input_array,
                      float* output_array,
                      size_t size)
{
   int ii;
#pragma omp parallel for default(shared) private(ii)
   for (ii=0;ii<size;ii++)
       do something using ii and the arrays
}

Compile:

gcc -O3 -lm -fopenmp -fPIC -c test.c -o test.o
gcc -shared -lgomp -o libmytest.so test.o

To answer your question, I have had no problems with this kind of setup and the speed improvements achievable have been impressive (although the above example wouldn't really benefit from OMP)

Sign up to request clarification or add additional context in comments.

2 Comments

Very promising! Thank you very much! I was specially concerned about the thread safety of function calls and how that could affect the python side of things. But from what I see in your example, the interfacing doesn't leave much room for things to go wrong, from a thread level perspective. I'll look into it as of now! Thanks again.
When using ctypes, as long as the library is not called using PyDLL, the Global Interpreter Lock is released, so the C functions and their threads act independently of Python.
3

Cython has parallel.prange(), example.

To do it manually, call PyEval_InitThreads on your extension module initialization. Release GIL while forking non-python threads e.g., by wrapping OMP parallel section in Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS. Acquire/release GIL if you need access to python objects using state = PyGILState_Ensure()/PyGILState_Release(state). Here's an example (convoluted to trigger an error in threading module on shutdown). To avoid any issues, import threading on your module initialization.

1 Comment

The solution from @ebarr came quite handy, so I took that approach. However this definitely seems be most useful as well. Since I like to do things in various ways, I'll keep this in mind! Thank you very much for your time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.