I executed following cuda kernel, to understand the texture fetching mechanism but looks like A[i][j] th entry in array should be fetched by tex2D(tex_ref,j,i) Isn't this counter-intuitive ? Am I missing anything here ?
texture<float, cudaTextureType2D, cudaReadModeElementType> texRef;
surface<void,cudaSurfaceType2D> surfRef;
__global__ void transformKernel(float device_array[3][3],
int width, int height,
float theta)
{
unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;
printf("\n Array : %d %d = %f",x,y,device_array[x][y]);
printf("\n Texture : %d %d = %f",x,y,tex2D(texRef,x,y));
float temp;
surf2Dread(&temp, surfRef, x*sizeof(float),y);
printf("\n Surface : %d %d = %f",x,y,temp);
}
Array is populated by setting host_array[i][j] to (i-j), and copied to device_array and same array was bound to texRef,surfRef.
The output looks like :
Array : 0 0 = 0.000000
Array : 1 0 = 1.000000
Array : 0 1 = -1.000000
Array : 1 1 = 0.000000
Texture : 0 0 = 0.000000
Texture : 1 0 = -1.000000
Texture : 0 1 = 1.000000
Texture : 1 1 = 0.000000
Reads from surface returns similar values as that returned after fetching texture.