1

I'm new in the world of OpenCL and I would like to increase my knowledge about it.

I have tried to find information about how build 'complex functions' using OpenCL. For 'complex functions', I mean functions which could be parallelized and have a function inside that can be parallelized too. I have seen links like:

And now, here I go with my question, I'm going to use an example:

// A and B are int vectors
// The value of M and N are different!! M != N
for(int i=0; i<=M-2;i++){
  for(int j=i+1;j<=M-1;j++){
    distance=calculate_distance(A[i],B[j]);
    //more sequential instructions
  }
}

And the calculate_distance concatenate both vectors and has a loop:

for(int i=0; i<=N-1;i++)
  // Some sequential instructions

Could this full fragment of code be parallelized? In that case How (this is the reason of the tittle kernel inside kernel)?

Note: I'm using Intel(R) SDK for OpenCL - Offline Compiler 2012 ( Windows) to check my kernels.

Thanks in advance

3
  • What value ranges do you expect for M and N? do you know anything else about the data? can you provide more information about calculate_distance()? Commented Jan 3, 2013 at 14:50
  • The value of both are lower than 10, but different. And about calculate_distance return a integer which is equal to the number of 1's inside the concatenation of A with B Commented Jan 3, 2013 at 16:02
  • By number of 1's, do you mean the total bits in the two ints you pass to calculate_distance? What operation do you execute N-1 times? Is this an algorithm I can look up for more info somewhere? Commented Jan 3, 2013 at 16:47

2 Answers 2

2

In order to write parallel code you need pay much more attention to data flow. What does your input data look like? What does your output data look like? How do you transform a piece of input data into output data?

As for your question(s):

  • It's not possible to decide whether the example you provided is parallelizable because the data flow is not apparent.
  • You can call functions from your kernel code, they will be inlined into the kernel.

Hint:

Also check Converting C/C++ for loops into CUDA - it's CUDA not OpenCL, but the principles are alike.

If your output data is just a single value (e.g. maximum distance) you might want to look at reduction kernels and understand how they work.

Sign up to request clarification or add additional context in comments.

1 Comment

The first of all, excuse me for the delay. Thank you for your answer, you were right. I had a mistake with input/output data.
-1

Make your function re-entrant.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.