I'm using the System.GPU.OpenCL module by Luis Cabellos to control an OpenCL kernel.
All is working well but to speed things up I am trying to cache some global memory into a local buffer. I have just noticed that it seems to be impossible to pass a local buffer using the current definition of clSetKernelArg, but perhaps someone can enlighten me?
The definition is,
clSetKernelArg :: Storable a => CLKernel -> CLuint -> a -> IO ()
clSetKernelArg krn idx val = with val $ \pval -> do
whenSuccess (raw_clSetKernelArg krn idx (fromIntegral . sizeOf $ val) (castPtr pval))
$ return ()
where the raw function is defined as,
foreign import CALLCONV "clSetKernelArg" raw_clSetKernelArg ::
CLKernel -> CLuint -> CSize -> Ptr () -> IO CLint
Therefore the high level clSetKernelArg conveniently figures out the size of the memory and also extracts a pointer to it. This is perfect for global memory, but it seems that the way to use clSetKernelArg when local memory is requested is to specify the size of the desired memory in the CSize, and set Ptr to zero. Of course, putting nullPtr here doesn't work, so how can I circumvent this problem? I would call raw_clSetKernelArg directly, but it seems it is not exported by the module.
Thanks.
clSetKernelArginterface doesn't let you specify your own pointer.