0

I wrote this code to find the highest temperature pixel in a thermal image. I also need to know the coordinates of the pixel in the image.

void _findMax(uint16_t* image, int sz, sPixelData* returnPixel)
{
    int temp = 0;
    uint16_t max = image[0];

    for(int i = 1; i < sz; i++)
    {
        if(max < image[i])
        {
            max=image[i];
            //temp = i;
        }
    }

    returnPixel->temperature = image[temp];

    //returnPixel->x_location = temp % IMAGE_HORIZONTAL_SIZE;
    //returnPixel->y_location = temp / IMAGE_HORIZONTAL_SIZE;
}

With the three lines commented out the function executes in about 2ms. With the lines uncommented it takes about 35ms to execute the function.

This seems very excessive seeing as the divide and modulus are only performed once after the loop.

Any suggestions on how to speed this up?

Or why it takes so long to execute compared to the divide on modulus not include?

This is executing on an ARM A9 processor running Linux.

The compiler I'm using is ARM v8 32-Bit Linux gcc compiler.

I'm using optimize -O3 and the following compile options: -march=armv7-a+neon -mcpu=cortex-a9 -mfpu=neon-fp16 -ftree-vectorize.

8
  • If you don't update temp, it's always 0 and the function only executes returnPixel->temperature = image[0]. The compiler correctly identifies that the loop is not needed and removes it. Commented Nov 4, 2020 at 16:11
  • Yes I am aware of that, my question is, why does the function take so much longer to execute when in theory, just one extra divide and modulus after the loop has executed. The loop only has the extra temp = i;, so that should only cause it to take about twice as long surely? Commented Nov 4, 2020 at 16:15
  • No, these are two completely different programs: one with a loop (see godbolt.org/z/555qsM) and the other one consisting of three assembler instructions only (see godbolt.org/z/bGcYb1) Commented Nov 4, 2020 at 16:19
  • Why don't you use neon? Commented Nov 4, 2020 at 16:22
  • Not an answer, using the register modifier will shorten runtime if that is your real concern. Commented Nov 4, 2020 at 16:25

1 Answer 1

1

Your code is flawed.
Since temp is simply 0, the complier will generate machine codes that just executes returnPixel->temperature = image[0]; which gets finished in no time. There is nothing odd here.

You should modify the line to: returnPixel->temperature = max;

You could boost the performance significantly by utilizing neon. But that's another problem.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks I missed that, this is code after taking suggestions from other engineers at work, and I didn't notice that. The original code had returnPixel->temperature being set directly. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.