I am relatively new to OpenCL. I am using the OpenCL 1.2 C++ wrapper. Say I have the following problem: I have three integer values a, b, and c all declared on the host
int a = 1;
int b = 2;
int c = 3;
int help;
int d;
with d being my result and help being a help variable.
I want to calculate d = (a + b)*c. To do this, I now have two kernels called 'add' and 'multiply'.
Currently, I am doing this the following way (please don't be confused by my pointer oriented way of programming): First, I create my buffers
bufferA = new cl::Buffer(*context, CL_MEM_READ_ONLY, buffer_length);
cl::Buffer bufferB = new cl::Buffer(*context, CL_MEM_READ_ONLY, buffer_length);
bufferC = new cl::Buffer(*context, CL_MEM_READ_ONLY, buffer_length);
bufferHelp = new cl::Buffer(*context, CL_MEM_READ_WRITE, buffer_length);
bufferD = new cl::Buffer(*context, CL_MEM_WRITE_ONLY, buffer_length);
Then, I set my kernel arguments for the addition kernel
add->setArg(0, *bufferA);
add->setArg(1, *bufferB);
add->setArg(2, *bufferHelp);
and for the multiplicatoin kernel
multiply->setArg(0, *bufferC);
multiply->setArg(1, *bufferHelp);
multiply->setArg(2, *bufferD);
Then I enqueue my data for the addition
queueAdd->enqueueWriteBuffer(*bufferA, CL_TRUE, 0, datasize, &a);
queueAdd->enqueueWriteBuffer(*bufferB, CL_TRUE, 0, datasize, &b);
queueAdd->enqueueNDRangeKernel(*add, cl::NullRange, global[0], local[0]);
queueAdd->enqueueReadBuffer(*bufferHelp, CL_TRUE, 0, datasize, &help);
and for the multiplication
queueMult->enqueueWriteBuffer(*bufferC, CL_TRUE, 0, datasize, &c);
queueMult->enqueueWriteBuffer(*bufferHelp, CL_TRUE, 0, datasize, &help);
queueMult->enqueueNDRangeKernel(*multiply, cl::NullRange, global[0], local[0]);
queueMult->enqueueReadBuffer(*bufferD, CL_TRUE, 0, datasize, &d);
This works in a fine way. However, I do not want to copy the value of help back to the host and then back on the device again. To achieve this, I thought of 3 possiblities:
- a global variable for help on the device side. Doing this, both kernels could access the value of help at any time.
- kernel add calling kernel multiply at runtime. We then would insert the value for c into the add kernel and pass both help and c over to the multiply kernel as soon as the addition has finished.
- Simply pass the value of help over to the multiplication kernel. What I search here is something like a pipe object as available for OpenCL 2.0. Does anybody know something similar for OpenCL 1.2.?
I would be very thankful if somebody could propose the smoothest way to solve my problem!
Thanks in advance!