Parallelize functions, Kernel inside Kernel is possible? gcc OpenCL

Question

I'm new in the world of OpenCL and I would like to increase my knowledge about it.

I have tried to find information about how build 'complex functions' using OpenCL. For 'complex functions', I mean functions which could be parallelized and have a function inside that can be parallelized too. I have seen links like:

And now, here I go with my question, I'm going to use an example:

// A and B are int vectors
// The value of M and N are different!! M != N
for(int i=0; i<=M-2;i++){
  for(int j=i+1;j<=M-1;j++){
    distance=calculate_distance(A[i],B[j]);
    //more sequential instructions
  }
}

And the calculate_distance concatenate both vectors and has a loop:

for(int i=0; i<=N-1;i++)
  // Some sequential instructions

Could this full fragment of code be parallelized? In that case How (this is the reason of the tittle kernel inside kernel)?

Note: I'm using Intel(R) SDK for OpenCL - Offline Compiler 2012 ( Windows) to check my kernels.

Thanks in advance

What value ranges do you expect for M and N? do you know anything else about the data? can you provide more information about calculate_distance()? — mfa
– mfa, Commented Jan 3, 2013 at 14:50
The value of both are lower than 10, but different. And about calculate_distance return a integer which is equal to the number of 1's inside the concatenation of A with B — Fran
– Fran, Commented Jan 3, 2013 at 16:02
By number of 1's, do you mean the total bits in the two ints you pass to calculate_distance? What operation do you execute N-1 times? Is this an algorithm I can look up for more info somewhere? — mfa
– mfa, Commented Jan 3, 2013 at 16:47

Community · Accepted Answer · 2017-05-23 12:04:33Z

2

In order to write parallel code you need pay much more attention to data flow. What does your input data look like? What does your output data look like? How do you transform a piece of input data into output data?

As for your question(s):

It's not possible to decide whether the example you provided is parallelizable because the data flow is not apparent.
You can call functions from your kernel code, they will be inlined into the kernel.

Hint:

Also check Converting C/C++ for loops into CUDA - it's CUDA not OpenCL, but the principles are alike.

If your output data is just a single value (e.g. maximum distance) you might want to look at reduction kernels and understand how they work.

edited May 23, 2017 at 12:04

CommunityBot

11 silver badge

answered Jan 3, 2013 at 14:50

edgar.holleis

5,0212 gold badges26 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Fran Over a year ago

The first of all, excuse me for the delay. Thank you for your answer, you were right. I had a mistake with input/output data.

manav m-n · Accepted Answer · 2013-01-03 14:17:03Z

-1

Make your function re-entrant.

edited Jan 3, 2013 at 14:17

answered Jan 3, 2013 at 14:08

manav m-n

11.4k23 gold badges77 silver badges99 bronze badges

Collectives™ on Stack Overflow

Parallelize functions, Kernel inside Kernel is possible? gcc OpenCL

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related