How to speed up binary read/write

Question

I have this code

//N = 32;
//B = 27;
using (FileStream fs = File.Open(path, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
    using (BinaryReader br = new BinaryReader(fs))
    {
        using (BinaryWriter bw = new BinaryWriter(fs))
        {
            for (int k = B; k < N; ++k)
            {
                Console.WriteLine(k);
                long pt = 0;
                long j = 1L << k;
                for (long i = 0; i < (1L << (N - 1)); ++i)
                {
                    long b1;
                    long b2;

                    br.BaseStream.Seek(8 * (pt), SeekOrigin.Begin);
                    b1 = br.ReadInt64();
                    br.BaseStream.Seek(8 * (j - 1), SeekOrigin.Current);
                    b2 = br.ReadInt64();

                    long t1 = b1 + b2;
                    long t2 = b1 - b2;

                    bw.BaseStream.Seek(8 * (pt), SeekOrigin.Begin);
                    bw.Write(t1);
                    bw.BaseStream.Seek(8 * (j - 1), SeekOrigin.Current);
                    bw.Write(t2);

                    pt += 1;
                    if ((pt & (j - 1L)) == 0)
                    {
                        pt += j;
                    }
                    if ((i % 100000) == 0) Console.WriteLine(i);
                }
            }
        }
    }
}

What's happening is, the program reads two longs from different positions in a very large (17 GB) file, adds/subtracts them, then rewrites the new values in the same positions.

From what I can gather, the most efficient way to read data is to read a large chunk into a buffer and then work with that. However, that approach doesn't work here, because based on the values of pt and j, it could be reading from the beginning and end of the file, and of course I can't store all 17 GB in memory.

The line

if ((i % 100000) == 0) Console.WriteLine(i);

is for debugging, and it's about 2 seconds between them on my computer. I need this to be much faster. The paper I'm following said their implementation took less than 30 minutes for this loop. Is there a faster alternative for reading lots of numerical data quickly?

The big part of this operation will be the Disk ones. Either the data is in memory, or it has to be read into it from the Disk. You can try to leave it to the Disk Caches, but they are overtaxed with two 17 GiB files as well. Disks are glacially slow. The only thing possibly slower is the Network. — Christopher
– Christopher, Commented Oct 29, 2018 at 2:23
I can most certainly say there is a faster way to do what you are doing. However, without forensically going through your code line by line, and/or understanding or guessing at what the actual problem you are trying to solve, and what the magic numbers you have within, its impossible to give you any advice other than say yes its probably possible to speed this up — TheGeneral
– TheGeneral, Commented Oct 29, 2018 at 2:24
@RichardHubley: 1) While I agree that actually writing to the console can be expensive, it is not 2 seconds expensive. 2) He already said it is only there for debugging. 3) This is a disk operation. Not a lot of other things will mater with those. — Christopher
– Christopher, Commented Oct 29, 2018 at 2:24

TheGeneral · Accepted Answer · 2018-10-29 03:54:04Z

2

This is not really an answer per-se. However, it should give you ideas on how you can specifically speed this up

At first glance, this breaks down in to probabilities, parallelism, and chuck size.

If there is a high probability that the next/write read can be found in a larger chunk, then a large chunk size will be a performance gain. In turn it wont have to keep scanning the disk.

If you are using SSD's, you can probably load up oodles of Mbs (at a time) in a more performant manner than the default 4k chunk it is probably using .

Also, seemingly this can be broken down in to parallel workloads... Though it really is unclear what modifications you would need on the outset.

However, if you really want this fast

Go an buy your self 32 gig of ram
Create an inherited Stream Class, or even better just a custom class
Load the entire data set into memory, broken into arrays of chunks of about a gig.
Make use of direct pointer access
Use parallel work loads

If you could do this, (and this is speculative) you could probably speed this up many factors faster. And for the messily cost of a couple of hundred dollars in memory and a days worth of coding.

Awesome comment by @NPras

Instead of managing the RAM caching/chunking yourself, you may also want to look at the concept of memory-mapped files and let the OS manage it for you

And from Managing Memory-Mapped Files

edited Oct 29, 2018 at 3:54

answered Oct 29, 2018 at 2:47

TheGeneral

82k9 gold badges111 silver badges151 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

HiddenBabel Over a year ago

Thanks! I'm not a full-time programmer or anything, so I was curious about better principles in using C#. However, I think I might as well just buy that RAM and then I can load the whole input vector into it and don't have to read from disk at all. Why is it faster to then break up the arrays in RAM?

TheGeneral Over a year ago

@HiddenBabel be warned, its not as simple as loading up 17 gigs, you will need break this up, as the array size i believe in .net is 2 gig. Honestly though, i would love to sink my teeth into this and see how fast i could get it.

HiddenBabel Over a year ago

Oooh, so that's why it throws an error trying to make larger arrays... I'll start researching some of this stuff. I haven't used pointers in C# either. But I'll check back if you edit with more info haha.

NPras Over a year ago

Instead of managing the RAM caching/chunking yourself, you may also want to look at the concept of memory-mapped files and let the OS manage it for you

TheGeneral Over a year ago

@NPras awesome comment. do you mind if i add it to the answer?

|

Keith · Accepted Answer · 2018-10-29 03:24:08Z

0

If I understand correctly, the results are written back in the locations you just read from.

So if you reverse the order of the writes, the first write will be to the same location you last read from.

That will reduce the seek time.

Further, that means the next read will also be contiguous with the other write, again reducing seek time.

Now the main loop over 'i' is obviously long, but I think you could:

Break that up into medium size chunks (64M or so might be all you need)
Do whole block of reads
Do the second block of reads
Do an in-memory calculation for both blocks
Write them out

edited Oct 29, 2018 at 3:24

answered Oct 29, 2018 at 3:13

Keith

6,84422 silver badges24 bronze badges

Collectives™ on Stack Overflow

How to speed up binary read/write

2 Answers 2

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related