c# fixed arrays - which structure is fastest to read from?

Question

I have some large arrays of 2D data elements. A and B aren't equally sized dimensions.

A) is between 5 and 20

B) is between 1000 and 100000

The initialization time is no problem as its only going to be lookup tables for realtime application, so performance on indexing elements from knowing value A and B is crucial. The data stored is currently a single byte-value.

I was thinking around these solutions:

byte[A][B] datalist1a;

or

byte[B][A] datalist2a;

or

byte[A,B] datalist1b;

or

byte[B,A] datalist2b;

or perhaps loosing the multidimension as I know the fixed size and just multiply the to values before looking it up.

byte[A*Bmax + B] datalist3;

or

byte[B*Amax + A] datalist4;

What I need is to know, what datatype/array structure to use for most efficient lookup in C# when I have this setup.

Edit 1 the first two solutions were supposed to be multidimensional, not multi arrays.

Edit 2 All data in the smallest dimension is read at each lookup, but the large one is only used for indexing once at a time.

So its something like - Grab all A's from sample B.

Probably depends how you're stepping through the array when looking up values. — George Duckett
– George Duckett, Commented Aug 3, 2011 at 10:17
Have you also considered a two dimensional array: byte[A,B]? (The form byte[A][B] is an array of arrays.) — Richard
– Richard, Commented Aug 3, 2011 at 10:18
The first three rules of performance optimizations: Measure, measure, measure. Set up some kind of test environment that matches the production environment, and test them. — SWeko
– SWeko, Commented Aug 3, 2011 at 10:19
There is no single correct answer to this question in its current form. As @George mentions, it depends on how you access the array. Are you doing random access? sequential? If sequential, in which direction (ie. A-dimension or B-dimension) primarily? The best way to handle this is to write different versions of the code and profile it, and that's in fact the correct answer to any such "what is fastest" type questions anyway. — Lasse V. Karlsen
– Lasse V. Karlsen, Commented Aug 3, 2011 at 10:19
@Richard Using multidimensional arrays in C# has a significant impact on its performance, they are implemented differently:codeproject.com/KB/dotnet/arrays.aspx — Bas
– Bas, Commented Aug 3, 2011 at 10:23

Mario Vernari · Accepted Answer · 2011-08-04 12:30:25Z

2

I'd bet on the jagged arrays, unless the Amax or Bmax are a power of 2.

I'd say so, because a jagged array needs two indexed accesses, thus very fast. The other forms implies a multiplication, either implicit or explicit. Unless that multiplication is a simple shift, I think could be a bit heavier than a couple of indexed accesses.

EDIT: Here is the small program used for the test:

class Program
{
    private static int A = 10;
    private static int B = 100;

    private static byte[] _linear;
    private static byte[,] _square;
    private static byte[][] _jagged;



    unsafe static void Main(string[] args)
    {
        //init arrays
        _linear = new byte[A * B];
        _square = new byte[A, B];
        _jagged = new byte[A][];
        for (int i = 0; i < A; i++)
            _jagged[i] = new byte[B];

        //set-up the params
        var sw = new Stopwatch();
        byte b;
        const int N = 100000;

        //one-dim array (buffer)
        sw.Restart();
        for (int i = 0; i < N; i++)
        {
            for (int r = 0; r < A; r++)
            {
                for (int c = 0; c < B; c++)
                {
                    b = _linear[r * B + c];
                }
            }
        }
        sw.Stop();
        Console.WriteLine("linear={0}", sw.ElapsedMilliseconds);

        //two-dim array
        sw.Restart();
        for (int i = 0; i < N; i++)
        {
            for (int r = 0; r < A; r++)
            {
                for (int c = 0; c < B; c++)
                {
                    b = _square[r, c];
                }
            }
        }
        sw.Stop();
        Console.WriteLine("square={0}", sw.ElapsedMilliseconds);

        //jagged array
        sw.Restart();
        for (int i = 0; i < N; i++)
        {
            for (int r = 0; r < A; r++)
            {
                for (int c = 0; c < B; c++)
                {
                    b = _jagged[r][c];
                }
            }
        }
        sw.Stop();
        Console.WriteLine("jagged={0}", sw.ElapsedMilliseconds);

        //one-dim array within unsafe access (and context)
        sw.Restart();
        for (int i = 0; i < N; i++)
        {
            for (int r = 0; r < A; r++)
            {
                fixed (byte* offset = &_linear[r * B])
                {
                    for (int c = 0; c < B; c++)
                    {
                        b = *(byte*)(offset + c);
                    }
                }
            }
        }
        sw.Stop();
        Console.WriteLine("unsafe={0}", sw.ElapsedMilliseconds);

        Console.Write("Press any key...");
        Console.ReadKey();
        Console.WriteLine();
    }
}

edited Aug 4, 2011 at 12:30

answered Aug 3, 2011 at 10:46

Mario Vernari

7,4241 gold badge37 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

14 Comments

BerggreenDK Over a year ago

Amax or Bmax is just the length of the two dimensions, and they will not be the power of 2 very often (only if we are lucky from time to time). So okay, jagged it is.

Jonathan Dickinson Over a year ago

@BerggreenDK remember to cache the current line in the first loop to reduce indexer operations, if iterating in order. E.g. for(...) { var line = arr[i]; for (...) { var elem = line[j]; } }

Jonathan Dickinson Over a year ago

Although Jagged arrays are good for maintainable data structures it isn't as simple as an indexer operation - remember the CLR has to do null checking, bounds checking, etc. This adds a significant amount of overhead to each indexer you do. Algebraic arrays would be faster because of this (but less maintainable).

Mario Vernari Over a year ago

FYI, I've added the source of the program used for the test.

M.Stramm Over a year ago

@MarioVernari: Oh no, it's not the code you posted which is wrong. I was referring to your comment (no.5 from the top) "Anyway, it does not matter which array is the first: it's an access that costs in equal manner." which is obviously wrong, because it does matter which array is first :)

|

Jonathan Dickinson · Accepted Answer · 2011-08-03 12:51:04Z

2

Multidimensional ([,]) arrays are nearly always the slowest, unless under a heavy random access scenario. In theory they shouldn't be, but it's one of the CLR oddities.
Jagged arrays ([][]) are nearly always faster than multidimensional arrays; even under random access scenarios. These have a memory overhead.
Singledimensional ([]) and algebraic arrays ([y * stride + x]) are the fastest for random access in safe code.
Unsafe code is, normally, fastest in all cases (provided you don't pin it repeatedly).

edited Aug 3, 2011 at 12:51

answered Aug 3, 2011 at 11:09

Jonathan Dickinson

9,2781 gold badge39 silver badges61 bronze badges

4 Comments

Mario Vernari Over a year ago

"Unsafe code is, obviously, the fastest in all cases". I don't think so. Try yourself, and you'll be surprised (C# speaking).

Jonathan Dickinson Over a year ago

You skip bounds checking; which is the slowest part of arrays.

Mario Vernari Over a year ago

That's right, but the unsafe way requires the array to be pinned with the "fixed" statement, which is slow. Also, well, it depends on how you want to have access to the array: bitmap access (so fixed once), or random access (fixed every time).

Hans Passant Over a year ago

@Mario - Fixed doesn't actually pin the memory, it uses a clever trick to avoid it. In spite of the documentation. No bounds checking either, it is fast and dangerous with perf like C arrays.

Richard · Accepted Answer · 2011-08-03 10:40:22Z

1

The only useful answer to "which X is faster" (for all X) is: you have to do performance tests that reflect your requirements.

And remember to consider, in general^*:

Maintenance of the program. If this is not a quick one off, a slightly slower but maintainable program is a better option in most cases.
Micro benchmarks can be deceptive. For instance a tight loop just reading from a collection might be optimised away in ways not possible when real work is being done.

Additionally consider that you need to look at the complete program to decide where to optimise. Speeding up a loop by 1% might be useful for that loop, but if it is only 1% of the complete runtime then it is not making much differences.

^* But all rules have exceptions.

answered Aug 3, 2011 at 10:40

Richard

110k21 gold badges214 silver badges279 bronze badges

1 Comment

M.Stramm Over a year ago

This is good general advice, but doesn't answer the question which is quite specific about optimizing performance of accesses to 2-dimensional chunks of data in C#, which has quite a specific answer.

user3290232 · Accepted Answer · 2018-09-28 01:51:43Z

0

On most modern computers, arithmetic operations are far, far faster than memory lookups. If you fetch a memory address that isn't in a cache or where the out of order execution pulls from the wrong place you are looking at 10-100 clocks, a pipelined multiply is 1 clock. The other issue is cache locality. byte[BAmax + A] datalist4; seems like the best bet if you are accessing with A's varying sequentially. When datalist4[bAmax + a] is accessed, the computer will usually start pulling in datalist4[bAmax + a+ 64/sizeof(dataListType)], ... +128 ... etc, or if it detects a reverse iteration, datalist4[bAmax + a - 64/sizeof(dataListType)]

Hope that helps!

answered Sep 28, 2018 at 1:51

user3290232

Comments

Akvel · Accepted Answer · 2011-08-03 10:44:27Z

-2

May be best way for u will be use HashMap

Dictionary?

answered Aug 3, 2011 at 10:44

Akvel

9611 gold badge15 silver badges33 bronze badges

1 Comment

BerggreenDK Over a year ago

too complex a datastructure, considering our need of simple bytearrays and an index.

Collectives™ on Stack Overflow

c# fixed arrays - which structure is fastest to read from?

5 Answers 5

14 Comments

4 Comments

1 Comment

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

14 Comments

4 Comments

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related