2

I've written a simple serial 1D convolution function (below). I'm also experimenting with GPU convolution implementations. This is mostly for my own curiosity; I'm trying to learn the performance tradeoffs among various non-FFT implementation strategies.

Avoiding branching will be important for my GPU convolution experiments, since branching is expensive on Nvidia GPUs. One of my friends mentioned that there's a way to implement the code below without if/else statements, but he couldn't remember how it works.

How can I do a correct 1D convolution implementation without using any if/else statements?

Here's my basic 1D serial code in C++:

vector<int> myConv1d(vector<int> vec, vector<int> kernel)
{
    int paddedLength = vec.size() + kernel.size() - 1;
    vector<int> convolved(paddedLength); //zeros
    reverse(kernel.begin(), kernel.end()); //flip the kernel (if we don't flip it, then we have correlation instead of convolution)
    for(int outputIdx=0; outputIdx<paddedLength; outputIdx++) //index into 'convolved' vector
    {
        int vecIdx = outputIdx - kernel.size() + 1; //aligns with leftmost element of kernel
        for(int kernelIdx=0; kernelIdx<kernel.size(); kernelIdx++)
        {
            if( (vecIdx+kernelIdx) >= 0  &&  (vecIdx+kernelIdx) < vec.size() ) //TODO: FIND A WAY TO REMOVE THIS
            {
                convolved[outputIdx] += kernel[kernelIdx]*vec[vecIdx+kernelIdx];
            }
        }
    }
    return convolved;
}

A couple of quick notes:

  • I did find some related posts, but I didn't quite understand the strategy avoiding conditional statements.
  • I've also written a 2D convolution implementation, and I'm hoping to apply the results of this SO post to the 2D version as well.
  • This is NOT homework. It's marginally related to one of our research projects, but it's mostly for the sake of learning.
1
  • How are you planning on implementing this on the GPU? In a pixel shader or with CUDA? If using texture lookups in a pixel shader, can you not just have zeros at the bounds of the texture used for the data and set the texture addressing mode to clamp? Or even forget the zeros and use clamped addressing? Commented Sep 25, 2012 at 7:21

2 Answers 2

3

Why don't you do something like this?

int lowerBound = std::max( 0, -vecIdx );
int upperBound = std::min( kernel.size(), vec.size() - vecIdx );
for( int kernelIdx = lowerBound; kernelIdx < upperBound; kernelIdx++ )

Sorry If I didn't understand the question.

Sign up to request clarification or add additional context in comments.

2 Comments

Waaait, a minute...I found a case where this breaks. In matlab, conv([1 2 3 4 5] , [2 4 6]) = [ 2 8 20 32 44 44 30 ]. This matches my code from the original question. Using the above min/max idea, I get myConv1d([1 2 3 4 5] , [2 4 6]) = [ 6 16 28 40 52 28 10 ]. I think there's a mistake...perhaps an off-by-one?
It looks like you forgot to reverse the kernel in min/max version.
1

Either zero-extend or border-extend the source vector to avoid the checks. If the source vector V is of size L and the kernel of size K, pad it by prepending and appending K-1 elements.

Let L = 5 and K = 3, you should end up with the padded vector

p p v v v v v q q

where the vs are the vector elements, the ps and qs the padding. Keep in mind that GPU toolkits should allow to clamp elements outside of a source vector to either 0 or to the border value - effectively making the above padding useless.

1 Comment

I haven't done shader programming before, but I'm pretty familiar with CUDA. This is probably a silly question: what do you mean by "clamp?"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.