0

I am writing a script to perform an FFT using the GPU/CUDA based cuFFT library. CuFFT requires that input data must be in the format specified as "cufftComplex". However my input data is in the numpy.complex64 format. I am using the Python C-API to send data from python to C. How can I convert between the two formats? Currently my code looks like this:

#include<python2.7/Python.h>
#include<numpy/arrayobject.h>
#include<cufft.h>


void compute_BP(PyObject* inputData, pyObject* OutputData, int Nfft)
{
   cuffthandle plan;
   cuFFTPlan1d(&plan, Nfft, CUFFT_C2C, CUFFT_INVERSE);
   cuFFTExecC2C(plan, inputData, OutputData, CUFFT_INVERSE);
   ...
 }

When compiling I get the following error:

Error: argument of type "PyObject *" is incompatible with parameter of type "cufftComplex".

4
  • 1
    cufft is not callable from a __global__ function (there is no cufft device API), so I doubt your code looks like this. If it does look like this, it won't work regardless of your concerns about data format. Commented May 22, 2018 at 22:30
  • @RobertRovella: Removing global Commented May 22, 2018 at 22:40
  • std::complex<double> layout in C++ should exactly match cufftDoubleComplex, and std::complex<float> layout in C++ should exactly match cufftComplex. With that information, your question effectively becomes how to convert python (numpy) complex types to C++ and vice-versa. For that, this question provides a suitable answer. Commented May 23, 2018 at 1:47
  • Your code/thinking here is broken beyond what has already been pointed out. You can't just blindly cast an arbitrary Python object ponter to a C array pointer and you can't pass a host C array pointer to cuFFT. You must use the Python buffer interface to access to the array memory and the the appropriate CUDA APIs to allocate and copy that memory to the device Commented May 23, 2018 at 2:28

1 Answer 1

2

borrowing from my answer here, here is a worked example of how you could use ctypes in python to run a function from the cufft library in a python script, using numpy data:

$ cat mylib.cpp
#include <cufft.h>
#include <stdio.h>
#include <assert.h>
#include <cuda_runtime_api.h>
extern "C"
void fft(void *input, void *output, size_t N){

  cufftHandle plan;
  cufftComplex *d_in, *d_out;
  size_t ds = N*sizeof(cufftComplex);
  cudaMalloc((void **)&d_in,  ds);
  cudaMalloc((void **)&d_out, ds);
  cufftResult res = cufftPlan1d(&plan, N, CUFFT_C2C, 1);
  assert(res == CUFFT_SUCCESS);
  cudaMemcpy(d_in, input, ds, cudaMemcpyHostToDevice);
  res = cufftExecC2C(plan, d_in, d_out, CUFFT_FORWARD);
  assert(res == CUFFT_SUCCESS);
  cudaMemcpy(output, d_out, ds, cudaMemcpyDeviceToHost);
  printf("%s\n", cudaGetErrorString(cudaGetLastError()));
  printf("from shared object:\n");
  for (int i = 0; i < N; i++)
    printf("%.1f + j%.1f, ", ((cufftComplex *)output)[i].x, ((cufftComplex *)output)[i].y);
  printf("\n");
}

$ cat t8.py
import ctypes
import os
import sys
import numpy as np

mylib = ctypes.cdll.LoadLibrary('libmylib.so')

N = 4
mydata = np.ones((N), dtype = np.complex64)
myresult = np.zeros((N), dtype = np.complex64)
mylib.fft(ctypes.c_void_p(mydata.ctypes.data), ctypes.c_void_p(myresult.ctypes.data), ctypes.c_size_t(N))
print(myresult)

$ g++ -fPIC -I/usr/local/cuda/include --shared mylib.cpp -L/usr/local/cuda/lib64 -lcufft -lcudart -o libmylib.so
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`pwd` python t8.py
no error
from shared object:
4.0 + j0.0, 0.0 + j0.0, 0.0 + j0.0, 0.0 + j0.0,
[4.+0.j 0.+0.j 0.+0.j 0.+0.j]
$
Sign up to request clarification or add additional context in comments.

1 Comment

This sidesteps the whole need to use the buffer interface by using the baked in ctypes interface in numpy arrays which is pretty common approach

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.