0

I'm trying to implement a code previously written in CUDA using OpenCL to run on Altera FPGA. I'm having problem reading back data that are supposed to be in the buffer. I use the same structure as CUDA version, only thing different is cudaMalloc can allocate memory for all types of pointer while for clCreateBuffer I have to use cl_mem. My code looks like this:

cl_mem d_buffer=clCreateBuffer(...);
 //CUDA version:
 //float* d_buffer;
 //cudaMalloc((void **)&d_buffer, MemSz);

clEnqueueWriteBuffer(queue, d_buffer, ..., h_data, );
 //cudaMemcpy(d_buffer, h_Data, MemSz, cudaMemcpyHostToDevice);

#define d_buffer(index1, index2, index3) &d_buffer + index1/index2*index3
 //#define d_buffer(index1, index2, index3) d_buffer + index1/index2*index3

cl_mem* d_data=d_buffer(1,2,3);

clEnqueueReadBuffer(queue, *d_data,...)// Error reading d_data

I tried clEnqueueMapBuffer or CL_MEM_ALLOC_HOST_PTR for the clCreateBuffer, it doesn't work either.

1 Answer 1

1

cl_mem is an opaque object. You should not perform pointer arithmetic on it; attempting to do so will result in very nasty bugs.

I'm not familiar with how CUDA handles buffer allocation, but the implication of your commented out code is that CUDA buffers are always Host-Visible. This is very strictly not the case in OpenCL. OpenCL allows you to "Map" a buffer to host-visible memory, but it won't be implicitly visible to the host. If you intend to read an arbitrary index of the buffer, you need to either map it first or copy it to host data.

float * h_data = new float[1000];
cl_mem d_buffer=clCreateBuffer(...);

clEnqueueWriteBuffer(queue, d_buffer, true, 0, 1000 * sizeof(float), h_data, 0, nullptr, nullptr);
//======OR======
//float * d_data = static_cast<float*>(clEnqueueMapBuffer(queue, d_buffer, true, CL_MAP_WRITE, 0, 1000 * sizeof(float), 0, nullptr, nullptr, nullptr));
//std::copy(h_data, h_data + 1000, d_data);
//clEnqueueUnmapMemObject(queue, d_buffer, d_data, 0, nullptr, nullptr);
//clEnqueueBarrier(queue);

//Do work with buffer, probably in OpenCL Kernel...

float result;
size_t index = 1 / 2 * 3; //This is what you wrote in the original post
clEnqueueReadBuffer(queue, d_buffer, true, index * sizeof(float), 1 * sizeof(float), &result, 0, nullptr, nullptr);
//======OR======
//float * result_ptr = static_cast<float*>(clEnqueueMapBuffer(queue, d_buffer, true, CL_MAP_READ, index * sizeof(float), 1 * sizeof(float), 0, nullptr, nullptr, nullptr));
//result = *result_ptr;
//clEnqueueUnmapMemObject(queue, d_buffer, result_ptr, 0, nullptr, nullptr);
//clEnqueueBarrier(queue);

std::cout << "Result was " << result << std::endl;
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.