15

I'm currently optimizing an application, one of the operations that is done very often is reading and writing binary. I need 2 types of functions:

Set(byte[] target, int index, int value);

int Get(byte[] source, int index);

These functions are needed for signed and unsigned short, int and long in big and little endian order.

Here are some examples i've made, but i need a evaluation about the advantages and disadvantages:

first method is using Marshal to write the value into the memory of the byte[], the second is using plain pointers to accomplish this and the third uses BitConverter and BlockCopy to do this

unsafe void Set(byte[] target, int index, int value)
{
    fixed (byte* p = &target[0])
    {
        Marshal.WriteInt32(new IntPtr(p), index, value);
    }
}

unsafe void Set(byte[] target, int index, int value)
{
    int* p = &value;
    for (int i = 0; i < 4; i++)
    {
        target[offset + i] = *((byte*)p + i);
    }
}

void Set(byte[] target, int index, int value)
{
    byte[] data = BitConverter.GetBytes(value);
    Buffer.BlockCopy(data, 0, target, index, data.Length);
}

And here are the Read/Get methods:

the first is using Marshal to read the value from the byte[], the second is using plain pointers and the third is using BitConverter again:

unsafe int Get(byte[] source, int index)
{
    fixed (byte* p = &source[0])
    {
        return Marshal.ReadInt32(new IntPtr(p), index);
    }
}

unsafe int Get(byte[] source, int index)
{
    fixed (byte* p = &source[0])
    {
        return *(int*)(p + index);
    }
}

unsafe int Get(byte[] source, int index)
{
    return BitConverter.ToInt32(source, index);
}

boundary checking needs to be done but isn't part of my question yet...

I would be pleased if someone can tell what would be the best and fastest way in this case or give me some other solutions to work on. A generic solution would be preferable


I Just did some performance testing, here are the results:

Set Marshal: 45 ms, Set Pointer: 48 ms, Set BitConverter: 71 ms Get Marshal: 45 ms, Get Pointer: 26 ms, Get BitConverter: 30 ms

it seems that using pointers is the fast way, but i think Marshal and BitConverter do some internal checking... can someone verify this?

3
  • 1
    You have the code, why don't you run it and test with a Stopwatch? Commented Jan 10, 2010 at 10:37
  • :/ :\ you're right, i will do this for quick and edit my question, but thats not the only point of my post. I'm searching for alternatives and maybe generic ways of doing this too Commented Jan 10, 2010 at 10:40
  • 2
    Raised eyebrow at this question: converting to binary should only be necessary for I/O. The I/O operation itself is always several orders of magnitude slower than massaging the bits. The best optimization can't buy you more than a few percent improvement. Commented Jan 10, 2010 at 15:37

4 Answers 4

16

Important: if you only need the one endian, see the pointer magic by wj32 / dtb


Personally, I would be writing directly to a Stream (perhaps with some buffering), and re-using a shared buffer that I can generally assume is clean. Then you can make some shortcuts and assume index 0/1/2/3.

Certainly don't use BitConverter, as that can't be used for both little/big-endian, which you require. I would also be inclined to just use bit-shifting rather than unsafe etc. It is actally the fastest, based on the following (so I'm glad that this is how I already do it my code here, look for EncodeInt32Fixed):

Set1: 371ms
Set2: 171ms
Set3: 993ms
Set4: 91ms <==== bit-shifting ;-p

code:

using System;
using System.Diagnostics;
using System.Runtime.InteropServices;
static class Program
{
    static void Main()
    {
        const int LOOP = 10000000, INDEX = 100, VALUE = 512;
        byte[] buffer = new byte[1024];
        Stopwatch watch;

        watch = Stopwatch.StartNew();
        for (int i = 0; i < LOOP; i++)
        {
            Set1(buffer, INDEX, VALUE);
        }
        watch.Stop();
        Console.WriteLine("Set1: " + watch.ElapsedMilliseconds + "ms");

        watch = Stopwatch.StartNew();
        for (int i = 0; i < LOOP; i++)
        {
            Set2(buffer, INDEX, VALUE);
        }
        watch.Stop();
        Console.WriteLine("Set2: " + watch.ElapsedMilliseconds + "ms");

        watch = Stopwatch.StartNew();
        for (int i = 0; i < LOOP; i++)
        {
            Set3(buffer, INDEX, VALUE);
        }
        watch.Stop();
        Console.WriteLine("Set3: " + watch.ElapsedMilliseconds + "ms");

        watch = Stopwatch.StartNew();
        for (int i = 0; i < LOOP; i++)
        {
            Set4(buffer, INDEX, VALUE);
        }
        watch.Stop();
        Console.WriteLine("Set4: " + watch.ElapsedMilliseconds + "ms");

        Console.WriteLine("done");
        Console.ReadLine();
    }
    unsafe static void Set1(byte[] target, int index, int value)
    {
        fixed (byte* p = &target[0])
        {
            Marshal.WriteInt32(new IntPtr(p), index, value);
        }
    }

    unsafe static void Set2(byte[] target, int index, int value)
    {
        int* p = &value;
        for (int i = 0; i < 4; i++)
        {
            target[index + i] = *((byte*)p + i);
        }
    }

    static void Set3(byte[] target, int index, int value)
    {
        byte[] data = BitConverter.GetBytes(value);
        Buffer.BlockCopy(data, 0, target, index, data.Length);
    }
    static void Set4(byte[] target, int index, int value)
    {
        target[index++] = (byte)value;
        target[index++] = (byte)(value >> 8);
        target[index++] = (byte)(value >> 16);
        target[index] = (byte)(value >> 24);
    }
}
Sign up to request clarification or add additional context in comments.

4 Comments

i don't think Stream would be a good solution, the problem is there may be seeks needed and the data is not always read and writen sequential. Another problem would be endianess..
i need to verify this first and may accept this answer as solution, what about getting/reading?
Same, but in reverse; return ((int)buffer[index++]) | (((int)buffer[index++]) << 8) | (((int)buffer[index++]) << 16) | (((int)buffer[index]) << 24); (or switch the shifts top-to-bottom to get other-endian). Note that we cast to int early, as int arithmetic is faster than byte arithmetic.
i think the shifting solution is the best at this time, the big advantage would be the ease of endian swaping.
14

Using Marc Gravell's Set1 to Set4 and the Set5 below, I get the following numbers on my machine:

Set1: 197ms
Set2: 102ms
Set3: 604ms
Set4: 68ms
Set5: 55ms <==== pointer magic ;-p

Code:

unsafe static void Set5(byte[] target, int index, int value)
{
    fixed (byte* p = &target[index])
    {
        *((int*)p) = value;                
    }
}

Of course, it gets much faster when the byte array isn't pinned on each iteration but only once:

Set6: 10ms (little endian)
Set7: 85ms (big endian)

Code:

if (!BitConverter.IsLittleEndian)
{
    throw new NotSupportedException();
}

watch = Stopwatch.StartNew();
fixed (byte* p = buffer)
{
    for (int i = 0; i < LOOP; i++)
    {
        *((int*)(p + INDEX)) = VALUE;
    }
}
watch.Stop();
Console.WriteLine("Set6: " + watch.ElapsedMilliseconds + "ms");

watch = Stopwatch.StartNew();
fixed (byte* p = buffer)
{
    for (int i = 0; i < LOOP; i++)
    {
        *((int*)(p + INDEX)) = System.Net.IPAddress.HostToNetworkOrder(VALUE);
    }
}
watch.Stop();
Console.WriteLine("Set7: " + watch.ElapsedMilliseconds + "ms");

7 Comments

the problem here would be the overhead generated by endian swaping
Cool; I've added an update, but endianness is still a problem here. I'm out of votes already today, but a virtual +1
@haze4real: the overhead produced by endian swapping is actually not that huge. Example updated.
:/ the overhead is huge its about 8 times slower in your test. You shouldn't pin the byte array outside of the loop in your test, cause its the function that needs to be tested... i can't pin it between these function calls
Right, you can only pin the array outside the loop if you want to modify the same array multiple times in a row. If this isn't the case in your application, you can't do this. But if you can, the performance gain is obvious.
|
3

Pointers are the way to go. Pinning objects with the fixed keyword is extremely cheap, and you avoid the overhead of calling functions like WriteInt32 and BlockCopy. For a "generic solution" you can simply use void* and use your own memcpy (since you're dealing with small amounts of data). However pointers do not work with true generics.

8 Comments

Then you clearly didn't write your code properly. Do you really think using bit-shifting (around 8 instructions for each Int32 at least) is faster than using a simple mov instruction? And I'm talking about pinning the buffer outside of the loop.
can you give me an example of this in c#? you're talking of the second Set method, aren't you? i just need the plain value types: Int16, UInt16, Int32, UInt32, Int64, UInt64
fixed (byte* b = array) { for (...) (int)(b + offset) = value; } }. Exactly what you have in your Get method. Look at the Disassembly window if you don't believe that this is in fact the fastest method.
Ah, right; I had the wrong end of the stick. But the requirement is for both-endian. So you'd have to have a fallback for the other, and some mechanism of abstracting the two. I'd keep it simple and write the shifts. It also raises the question "Do I bother checking/supporting big-endian hardware" etc.
Sadly C# doesn't offer a way to utilise the bswap instruction for big/little-endian compatibility, so in that case your solution would be the fastest. As for the for loop, it was meant to indicate that the pinning would be done outside of any loops.
|
1

You should do some profiling on your code to reveal whether this is the bottleneck. Also looking at your code it appears that you are using .Net function calls to write one byte to an unmanaged array, involving a pin on the memory and a call to unsafe code...

You might be much better off declaring a .Net System.IO.MemoryStream and seeking and writing around to it, wherever possible using a stream writer to push your changes in, which should use less function calls and won't require unsafe code. You'll find the pointer stuff much more useful in C# if you are doing things like DSP, where you need to perform a single operation to every value in an array etc.

EDIT: Let me also mention that depending on what you are doing you might find that the CPU caching will come into effect, if you can keep working on a single small area of memory that fits into the cache then you will end up with the best performance.

1 Comment

The problem is it can be a bottleneck cause the application is communicating with a load of different network devices and is running on low-cost machines, some of these devices use heavy protocols others don't. Do you know a good way to profile the interfaces? problem will be the variable delay of the network..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.