1

I have funcion, which is called very frequently. This function has two nested for loops inside. Each of the for loops iterates from 0 to 900. The code looks like this:

 for (int j = 0; j < width; j++)
            {
                for (int k = 0; k < height; k++)
                {
                    switch (Dim2[j * width + k])
                    {
                        case 0:
                            cwA = Dim0[j * width + ((k == (height - 1)) ? 0 : (k + 1))];
                            ccwA = Dim0[((j == (width - 1)) ? 0 : (j + 1)) * width + k];
                            oppA = Dim0[((j == (width - 1)) ? 0 : (j + 1)) * width + ((k == (height - 1)) ? 0 : (k + 1))];
                            cwB = Dim3[j * width + ((k == (height - 1)) ? 0 : (k + 1))];
                            ccwB = Dim3[((j == (width - 1)) ? 0 : (j + 1)) * width + k];
                            oppB = Dim3[((j == (width - 1)) ? 0 : (j + 1)) * width + ((k == (height - 1)) ? 0 : (k + 1))];
                            break;

                        case 1:
                            cwA = Dim0[((j == (width - 1)) ? 0 : (j + 1)) * width + k];
                            ccwA = Dim0[j * width + ((k == 0) ? (height - 1) : (k - 1))];
                            oppA = Dim0[((j == (width - 1)) ? 0 : (j + 1)) * width + ((k == 0) ? (height - 1) : (k - 1))];
                            cwB = Dim3[((j == (width - 1)) ? 0 : (j + 1)) * width + k];
                            ccwB = Dim3[j * width + ((k == 0) ? (height - 1) : (k - 1))];
                            oppB = Dim3[((j == (width - 1)) ? 0 : (j + 1)) * width + ((k == 0) ? (height - 1) : (k - 1))];
                            break;

                        case 2:
                            cwA = Dim0[((j == 0) ? (width - 1) : (j - 1)) * width + k];
                            ccwA = Dim0[j * width + ((k == (height - 1)) ? 0 : (k + 1))];
                            oppA = Dim0[((j == 0) ? (width - 1) : (j - 1)) * width + ((k == (height - 1)) ? 0 : (k + 1))];
                            cwB = Dim3[((j == 0) ? (width - 1) : (j - 1)) * width + k];
                            ccwB = Dim3[j * width + ((k == (height - 1)) ? 0 : (k + 1))];
                            oppB = Dim3[((j == 0) ? (width - 1) : (j - 1)) * width + ((k == (height - 1)) ? 0 : (k + 1))];
                            break;

                        case 3:
                            cwA = Dim0[j * width + ((k == 0) ? (height - 1) : (k - 1))];
                            ccwA = Dim0[((j == 0) ? (width - 1) : (j - 1)) * width + k];
                            oppA = Dim0[((j == 0) ? (width - 1) : (j - 1)) * width + ((k == 0) ? (height - 1) : (k - 1))];
                            cwB = Dim3[j * width + ((k == 0) ? (height - 1) : (k - 1))];
                            ccwB = Dim3[((j == 0) ? (width - 1) : (j - 1)) * width + k];
                            oppB = Dim3[((j == 0) ? (width - 1) : (j - 1)) * width + ((k == 0) ? (height - 1) : (k - 1))];
                            break;
                    }
                    woll = (((oppB + ccwB) + cwB) + Dim3[j * width + k]) > 0;
                    collision = ((Dim0[j * width + k] == oppA) && (cwA == ccwA)) && (Dim0[j * width + k] != cwA);
                    Dim6[j * width + k] = (short)(3 - Dim2[j * width + k]);
                    if (woll || collision)
                    {
                        Dim4[j * width + k] = Dim0[j * width + k];
                    }
                    else
                    {
                        Dim4[j * width + k] = _phase ? cwA : ccwA;
                    }
                }
            }

it takes around 0.1 second to execute these for loops, which is too slow. I've replaced two-dimentional arrays with 1 dimentional, this significantly improved performance. Are there any other performance improvements for the code? Will it work faster if I migrate it to c++? Should I use any other language for arrays manipulation? What would you suggest?
Thanks in advance,
Sam

6
  • Maybe also try to explain what you like to accomplish. There might be standard libraries (e.g. OpenCV) that are highly optimised to do your task Commented May 21, 2012 at 16:01
  • Maybe pretty obvious but: Turn on optimizations and switch to release, and see if it is fast enough. Commented May 21, 2012 at 16:07
  • Did you try this in a Release build or just a Debug build? Commented May 21, 2012 at 16:07
  • Also, make sure you traverse the arrays according to their layout in memory. So try to enumerate columns first, then rows. I can see how this is might be difficult to extract, though ;) Commented May 21, 2012 at 16:11
  • @skarmats: .net optimization is turned on, changing configuration to Release makes no difference (execution time is the same). Commented May 21, 2012 at 17:41

7 Answers 7

2

Refactor things like height - 1, j + 1, width - 1, j * width into variables so they're only calculated once. It will help a little. In fact, you could add to this list:

(j == (width - 1)) ? 0 : (j + 1)
Sign up to request clarification or add additional context in comments.

10 Comments

+1 and maybe get rid of the tenary operations or at least calculate them once.
@dowhilefor - indeed, added one of them in an edit already, but there is quite a list...
He should calculate them outside the loops.
Well, except the ones that are based on loop variables. So the ternary condition I've mentioned should be calculated inside the outer loop but outside the inner one; things like height - 1 outside both, absolutely.
also maybe an ifelse is faster than a switch, but that is just a guess related to a long long time optimization session back on an old arm cpu. Maybe compilers these days are smart enough.
|
1

Will it work faster if I migrate it to c++?

If by C++ native is referred, It should.
Why
1. Garbage collector is not there
2. Memory realignment is not there
3. CLR is not there

However optimization may be there in managed code by CLR, equivalent native code should be faster. That is the precise reason most of the BCL CPU intensive logic is in native code(decorated by MethodImplOptions.InternalCall).

3 Comments

I have found that you can get speeds very close to native C++ in this type of code by using C#'s unsafe contexts.
That is right. Reason to use unsafe, and reason to use native are same. Both serves same purpose. When I see snippet of code has performance concern, i got unsafe way. If snippet it pretty large, i prefer C++ native library, and P/Invoke
Yep, I have to try unsafe code with pointers usage, thx guys.
1

Can you use unsafe contexts in this project? You should be able to significantly improve performance by using pointers rather than indexing the array as each time you read from the array you will no longer incur .Net's array bounds checking etc.

6 Comments

Not necessarily: if the majority of the project is in C# then you get the best of both worlds, i.e. everything can be written in the same language without managed to unmanaged transitions, and you can still use fast pointer arithmetic.
OK, agreed, but anyway, it's nasty.
@MihaiTodor: not at all! Unsafe is there for a reason in C# and it's exactly for this sort of thing. You still retain C#'s garbage collection etc etc but bypass .Net's array bounds checking.
I don't want to open this can of worms around here, since it's not relevant. I was just trying to point out that it usually looks nasty and one can get it wrong in so many ways, that end up biting you sooner or later :)
@MihaiTodor: Yes, fair enough! Though if he were to move to C++ he'd still have those problems, plus a whole bunch of other potential problems on top :)
|
0

I'm not a C# expert, but I would try to put all calculations that are 'static' and inside your loop (your inline conditionals, and multiplications) outside the loop.

Comments

0

If you need a significant speedup, maybe you should consider using multiple threads, at least for the outer loop. Also, make sure that you are not using overflow checks.

2 Comments

multithreading is not applicable in this case, as far as all operations inside outer loop depend on the result of the inner loop.
It's kind of hard to see that, given the above code :) You're probably mixing the indexes all over the place...
0

Another solution, especially for modern computers with multiple cores might be to change the outer for loop into a call to Parallel.For.

You should make the other optimizations suggested here first, though.

3 Comments

How can outer loop be runned in parallel? Parallel execution is not possible, because inner loop uses iterator from the outer loop.
Simply put, each iteration of the outer loop can run independently. The inner loop will still have access to the outer loop's state. At least that appears to be the case as far as I can see from your code.
@Groky: I think the issue with this approach is that Dim4 and Dim6 need to be computed sequentially, but I'm also having a hard time figuring out what's going on, given that code.
0

You can remove j == 0 and j == (width - 1) test completely by having 3 copies of the inner loop. You can do the same thing with k, if you peel the first and last iterations off of the loop. Of course if you do both you'd have 9 copies of the inner code, which isn't really nice, and I wouldn't particularly recommend that - removing the conditionals depending on k should have a bigger effect, and/because you can move the conditionals depending on j to outside the inner loop anyway.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.