I've written a simple serial 1D convolution function (below). I'm also experimenting with GPU convolution implementations. This is mostly for my own curiosity; I'm trying to learn the performance tradeoffs among various non-FFT implementation strategies.
Avoiding branching will be important for my GPU convolution experiments, since branching is expensive on Nvidia GPUs. One of my friends mentioned that there's a way to implement the code below without if/else statements, but he couldn't remember how it works.
How can I do a correct 1D convolution implementation without using any if/else statements?
Here's my basic 1D serial code in C++:
vector<int> myConv1d(vector<int> vec, vector<int> kernel)
{
int paddedLength = vec.size() + kernel.size() - 1;
vector<int> convolved(paddedLength); //zeros
reverse(kernel.begin(), kernel.end()); //flip the kernel (if we don't flip it, then we have correlation instead of convolution)
for(int outputIdx=0; outputIdx<paddedLength; outputIdx++) //index into 'convolved' vector
{
int vecIdx = outputIdx - kernel.size() + 1; //aligns with leftmost element of kernel
for(int kernelIdx=0; kernelIdx<kernel.size(); kernelIdx++)
{
if( (vecIdx+kernelIdx) >= 0 && (vecIdx+kernelIdx) < vec.size() ) //TODO: FIND A WAY TO REMOVE THIS
{
convolved[outputIdx] += kernel[kernelIdx]*vec[vecIdx+kernelIdx];
}
}
}
return convolved;
}
A couple of quick notes:
- I did find some related posts, but I didn't quite understand the strategy avoiding conditional statements.
- I've also written a 2D convolution implementation, and I'm hoping to apply the results of this SO post to the 2D version as well.
- This is NOT homework. It's marginally related to one of our research projects, but it's mostly for the sake of learning.