I am trying to implement hausdorff distance in OpenCL and the following kernel forms the basis for it or I presume it does as I still have to implement it completely. That said, can I get some suggestions or is there a way to optimize this kernel? Basically how can I remove the for loop in the kernel function which calls the helper function.....
OpenCL Kernel and its helper function:
void helper( int a_1, __global int* b_1, __global int* c_1 ){
int i = get_global_id(0);
c_1[i] = a_1 - b_1[i];
}
__kernel void test_call( __global int* a, //input buffer of size [100000, 1]
__global int* b, //input buffer of size [100000, 1]
__global int* c ){ //output buffer of size [100000, 1]
for ( int iter = 0 ; iter < 100000 ; iter++ ){
helper ( a[iter], b, c );
// once array c is obtained by calling the above function,
// it will be used in further processing that will take place inside
// this for loop itself
}
Essentially what I am trying to do here is to subtract each element in the input buffer 'a' with each element in the input buffer 'b'. The complexity will be O(n2).
By the way, this naive implementation itself produces results within 2.5 seconds. A serial implementation of this will take few minutes to complete the execution.