Need advice on improving performance of c# code

Question

I have funcion, which is called very frequently. This function has two nested for loops inside. Each of the for loops iterates from 0 to 900. The code looks like this:

 for (int j = 0; j < width; j++)
            {
                for (int k = 0; k < height; k++)
                {
                    switch (Dim2[j * width + k])
                    {
                        case 0:
                            cwA = Dim0[j * width + ((k == (height - 1)) ? 0 : (k + 1))];
                            ccwA = Dim0[((j == (width - 1)) ? 0 : (j + 1)) * width + k];
                            oppA = Dim0[((j == (width - 1)) ? 0 : (j + 1)) * width + ((k == (height - 1)) ? 0 : (k + 1))];
                            cwB = Dim3[j * width + ((k == (height - 1)) ? 0 : (k + 1))];
                            ccwB = Dim3[((j == (width - 1)) ? 0 : (j + 1)) * width + k];
                            oppB = Dim3[((j == (width - 1)) ? 0 : (j + 1)) * width + ((k == (height - 1)) ? 0 : (k + 1))];
                            break;

                        case 1:
                            cwA = Dim0[((j == (width - 1)) ? 0 : (j + 1)) * width + k];
                            ccwA = Dim0[j * width + ((k == 0) ? (height - 1) : (k - 1))];
                            oppA = Dim0[((j == (width - 1)) ? 0 : (j + 1)) * width + ((k == 0) ? (height - 1) : (k - 1))];
                            cwB = Dim3[((j == (width - 1)) ? 0 : (j + 1)) * width + k];
                            ccwB = Dim3[j * width + ((k == 0) ? (height - 1) : (k - 1))];
                            oppB = Dim3[((j == (width - 1)) ? 0 : (j + 1)) * width + ((k == 0) ? (height - 1) : (k - 1))];
                            break;

                        case 2:
                            cwA = Dim0[((j == 0) ? (width - 1) : (j - 1)) * width + k];
                            ccwA = Dim0[j * width + ((k == (height - 1)) ? 0 : (k + 1))];
                            oppA = Dim0[((j == 0) ? (width - 1) : (j - 1)) * width + ((k == (height - 1)) ? 0 : (k + 1))];
                            cwB = Dim3[((j == 0) ? (width - 1) : (j - 1)) * width + k];
                            ccwB = Dim3[j * width + ((k == (height - 1)) ? 0 : (k + 1))];
                            oppB = Dim3[((j == 0) ? (width - 1) : (j - 1)) * width + ((k == (height - 1)) ? 0 : (k + 1))];
                            break;

                        case 3:
                            cwA = Dim0[j * width + ((k == 0) ? (height - 1) : (k - 1))];
                            ccwA = Dim0[((j == 0) ? (width - 1) : (j - 1)) * width + k];
                            oppA = Dim0[((j == 0) ? (width - 1) : (j - 1)) * width + ((k == 0) ? (height - 1) : (k - 1))];
                            cwB = Dim3[j * width + ((k == 0) ? (height - 1) : (k - 1))];
                            ccwB = Dim3[((j == 0) ? (width - 1) : (j - 1)) * width + k];
                            oppB = Dim3[((j == 0) ? (width - 1) : (j - 1)) * width + ((k == 0) ? (height - 1) : (k - 1))];
                            break;
                    }
                    woll = (((oppB + ccwB) + cwB) + Dim3[j * width + k]) > 0;
                    collision = ((Dim0[j * width + k] == oppA) && (cwA == ccwA)) && (Dim0[j * width + k] != cwA);
                    Dim6[j * width + k] = (short)(3 - Dim2[j * width + k]);
                    if (woll || collision)
                    {
                        Dim4[j * width + k] = Dim0[j * width + k];
                    }
                    else
                    {
                        Dim4[j * width + k] = _phase ? cwA : ccwA;
                    }
                }
            }

it takes around 0.1 second to execute these for loops, which is too slow. I've replaced two-dimentional arrays with 1 dimentional, this significantly improved performance. Are there any other performance improvements for the code? Will it work faster if I migrate it to c++? Should I use any other language for arrays manipulation? What would you suggest?
Thanks in advance,
Sam

Maybe also try to explain what you like to accomplish. There might be standard libraries (e.g. OpenCV) that are highly optimised to do your task — Rob Audenaerde
– Rob Audenaerde, Commented May 21, 2012 at 16:01
Maybe pretty obvious but: Turn on optimizations and switch to release, and see if it is fast enough. — dowhilefor
– dowhilefor, Commented May 21, 2012 at 16:07
Also, make sure you traverse the arrays according to their layout in memory. So try to enumerate columns first, then rows. I can see how this is might be difficult to extract, though ;) — skarmats
– skarmats, Commented May 21, 2012 at 16:11
@skarmats: .net optimization is turned on, changing configuration to Release makes no difference (execution time is the same). — Semen Shekhovtsov
– Semen Shekhovtsov, Commented May 21, 2012 at 17:41

David M · Accepted Answer · 2012-05-21 15:59:08Z

2

Refactor things like height - 1, j + 1, width - 1, j * width into variables so they're only calculated once. It will help a little. In fact, you could add to this list:

(j == (width - 1)) ? 0 : (j + 1)

answered May 21, 2012 at 15:59

David M

73.2k13 gold badges164 silver badges187 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

dowhilefor Over a year ago

+1 and maybe get rid of the tenary operations or at least calculate them once.

David M Over a year ago

@dowhilefor - indeed, added one of them in an edit already, but there is quite a list...

Mihai Todor Over a year ago

He should calculate them outside the loops.

David M Over a year ago

Well, except the ones that are based on loop variables. So the ternary condition I've mentioned should be calculated inside the outer loop but outside the inner one; things like height - 1 outside both, absolutely.

dowhilefor Over a year ago

also maybe an ifelse is faster than a switch, but that is just a guess related to a long long time optimization session back on an old arm cpu. Maybe compilers these days are smart enough.

|

Tilak · Accepted Answer · 2012-05-21 16:05:01Z

1

Will it work faster if I migrate it to c++?

If by C++ native is referred, It should.
Why
1. Garbage collector is not there
2. Memory realignment is not there
3. CLR is not there

However optimization may be there in managed code by CLR, equivalent native code should be faster. That is the precise reason most of the BCL CPU intensive logic is in native code(decorated by MethodImplOptions.InternalCall).

edited May 21, 2012 at 16:05

answered May 21, 2012 at 15:59

Tilak

30.8k19 gold badges85 silver badges133 bronze badges

3 Comments

Grokys Over a year ago

I have found that you can get speeds very close to native C++ in this type of code by using C#'s unsafe contexts.

Tilak Over a year ago

That is right. Reason to use unsafe, and reason to use native are same. Both serves same purpose. When I see snippet of code has performance concern, i got unsafe way. If snippet it pretty large, i prefer C++ native library, and P/Invoke

Semen Shekhovtsov Over a year ago

Yep, I have to try unsafe code with pointers usage, thx guys.

Grokys · Accepted Answer · 2012-05-21 16:15:53Z

1

Can you use unsafe contexts in this project? You should be able to significantly improve performance by using pointers rather than indexing the array as each time you read from the array you will no longer incur .Net's array bounds checking etc.

edited May 21, 2012 at 16:15

answered May 21, 2012 at 15:59

Grokys

16.6k15 gold badges73 silver badges103 bronze badges

6 Comments

Grokys Over a year ago

Not necessarily: if the majority of the project is in C# then you get the best of both worlds, i.e. everything can be written in the same language without managed to unmanaged transitions, and you can still use fast pointer arithmetic.

Mihai Todor Over a year ago

OK, agreed, but anyway, it's nasty.

Grokys Over a year ago

@MihaiTodor: not at all! Unsafe is there for a reason in C# and it's exactly for this sort of thing. You still retain C#'s garbage collection etc etc but bypass .Net's array bounds checking.

Mihai Todor Over a year ago

I don't want to open this can of worms around here, since it's not relevant. I was just trying to point out that it usually looks nasty and one can get it wrong in so many ways, that end up biting you sooner or later :)

Grokys Over a year ago

@MihaiTodor: Yes, fair enough! Though if he were to move to C++ he'd still have those problems, plus a whole bunch of other potential problems on top :)

|

Rob Audenaerde · Accepted Answer · 2012-05-21 15:59:49Z

0

I'm not a C# expert, but I would try to put all calculations that are 'static' and inside your loop (your inline conditionals, and multiplications) outside the loop.

answered May 21, 2012 at 15:59

Rob Audenaerde

20.4k12 gold badges83 silver badges140 bronze badges

Comments

Mihai Todor · Accepted Answer · 2012-05-21 16:07:03Z

0

If you need a significant speedup, maybe you should consider using multiple threads, at least for the outer loop. Also, make sure that you are not using overflow checks.

answered May 21, 2012 at 16:07

Mihai Todor

8,3649 gold badges53 silver badges94 bronze badges

2 Comments

Semen Shekhovtsov Over a year ago

multithreading is not applicable in this case, as far as all operations inside outer loop depend on the result of the inner loop.

Mihai Todor Over a year ago

It's kind of hard to see that, given the above code :) You're probably mixing the indexes all over the place...

Grokys · Accepted Answer · 2012-05-21 16:18:51Z

0

Another solution, especially for modern computers with multiple cores might be to change the outer for loop into a call to Parallel.For.

You should make the other optimizations suggested here first, though.

answered May 21, 2012 at 16:18

Grokys

16.6k15 gold badges73 silver badges103 bronze badges

3 Comments

Semen Shekhovtsov Over a year ago

How can outer loop be runned in parallel? Parallel execution is not possible, because inner loop uses iterator from the outer loop.

Grokys Over a year ago

Simply put, each iteration of the outer loop can run independently. The inner loop will still have access to the outer loop's state. At least that appears to be the case as far as I can see from your code.

Mihai Todor Over a year ago

@Groky: I think the issue with this approach is that Dim4 and Dim6 need to be computed sequentially, but I'm also having a hard time figuring out what's going on, given that code.

user555045 · Accepted Answer · 2012-05-21 16:22:50Z

0

You can remove j == 0 and j == (width - 1) test completely by having 3 copies of the inner loop. You can do the same thing with k, if you peel the first and last iterations off of the loop. Of course if you do both you'd have 9 copies of the inner code, which isn't really nice, and I wouldn't particularly recommend that - removing the conditionals depending on k should have a bigger effect, and/because you can move the conditionals depending on j to outside the inner loop anyway.

answered May 21, 2012 at 16:22

user555045

65.8k6 gold badges95 silver badges181 bronze badges

Collectives™ on Stack Overflow

Need advice on improving performance of c# code

7 Answers 7

10 Comments

3 Comments

6 Comments

Comments

2 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

10 Comments

3 Comments

6 Comments

Comments

2 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related