1

I have this code

//N = 32;
//B = 27;
using (FileStream fs = File.Open(path, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
    using (BinaryReader br = new BinaryReader(fs))
    {
        using (BinaryWriter bw = new BinaryWriter(fs))
        {
            for (int k = B; k < N; ++k)
            {
                Console.WriteLine(k);
                long pt = 0;
                long j = 1L << k;
                for (long i = 0; i < (1L << (N - 1)); ++i)
                {
                    long b1;
                    long b2;

                    br.BaseStream.Seek(8 * (pt), SeekOrigin.Begin);
                    b1 = br.ReadInt64();
                    br.BaseStream.Seek(8 * (j - 1), SeekOrigin.Current);
                    b2 = br.ReadInt64();

                    long t1 = b1 + b2;
                    long t2 = b1 - b2;

                    bw.BaseStream.Seek(8 * (pt), SeekOrigin.Begin);
                    bw.Write(t1);
                    bw.BaseStream.Seek(8 * (j - 1), SeekOrigin.Current);
                    bw.Write(t2);

                    pt += 1;
                    if ((pt & (j - 1L)) == 0)
                    {
                        pt += j;
                    }
                    if ((i % 100000) == 0) Console.WriteLine(i);
                }
            }
        }
    }
}

What's happening is, the program reads two longs from different positions in a very large (17 GB) file, adds/subtracts them, then rewrites the new values in the same positions.

From what I can gather, the most efficient way to read data is to read a large chunk into a buffer and then work with that. However, that approach doesn't work here, because based on the values of pt and j, it could be reading from the beginning and end of the file, and of course I can't store all 17 GB in memory.

The line

if ((i % 100000) == 0) Console.WriteLine(i);

is for debugging, and it's about 2 seconds between them on my computer. I need this to be much faster. The paper I'm following said their implementation took less than 30 minutes for this loop. Is there a faster alternative for reading lots of numerical data quickly?

9
  • possible duplicate: stackoverflow.com/questions/2036718/… Commented Oct 29, 2018 at 2:21
  • 1
    What happens when you stop writing to the console? Commented Oct 29, 2018 at 2:21
  • The big part of this operation will be the Disk ones. Either the data is in memory, or it has to be read into it from the Disk. You can try to leave it to the Disk Caches, but they are overtaxed with two 17 GiB files as well. Disks are glacially slow. The only thing possibly slower is the Network. Commented Oct 29, 2018 at 2:23
  • 1
    I can most certainly say there is a faster way to do what you are doing. However, without forensically going through your code line by line, and/or understanding or guessing at what the actual problem you are trying to solve, and what the magic numbers you have within, its impossible to give you any advice other than say yes its probably possible to speed this up Commented Oct 29, 2018 at 2:24
  • @RichardHubley: 1) While I agree that actually writing to the console can be expensive, it is not 2 seconds expensive. 2) He already said it is only there for debugging. 3) This is a disk operation. Not a lot of other things will mater with those. Commented Oct 29, 2018 at 2:24

2 Answers 2

2

This is not really an answer per-se. However, it should give you ideas on how you can specifically speed this up

At first glance, this breaks down in to probabilities, parallelism, and chuck size.

If there is a high probability that the next/write read can be found in a larger chunk, then a large chunk size will be a performance gain. In turn it wont have to keep scanning the disk.

If you are using SSD's, you can probably load up oodles of Mbs (at a time) in a more performant manner than the default 4k chunk it is probably using .

Also, seemingly this can be broken down in to parallel workloads... Though it really is unclear what modifications you would need on the outset.

However, if you really want this fast

  • Go an buy your self 32 gig of ram
  • Create an inherited Stream Class, or even better just a custom class
  • Load the entire data set into memory, broken into arrays of chunks of about a gig.
  • Make use of direct pointer access
  • Use parallel work loads

If you could do this, (and this is speculative) you could probably speed this up many factors faster. And for the messily cost of a couple of hundred dollars in memory and a days worth of coding.

Awesome comment by @NPras

Instead of managing the RAM caching/chunking yourself, you may also want to look at the concept of memory-mapped files and let the OS manage it for you

And from Managing Memory-Mapped Files

enter image description here

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks! I'm not a full-time programmer or anything, so I was curious about better principles in using C#. However, I think I might as well just buy that RAM and then I can load the whole input vector into it and don't have to read from disk at all. Why is it faster to then break up the arrays in RAM?
@HiddenBabel be warned, its not as simple as loading up 17 gigs, you will need break this up, as the array size i believe in .net is 2 gig. Honestly though, i would love to sink my teeth into this and see how fast i could get it.
Oooh, so that's why it throws an error trying to make larger arrays... I'll start researching some of this stuff. I haven't used pointers in C# either. But I'll check back if you edit with more info haha.
Instead of managing the RAM caching/chunking yourself, you may also want to look at the concept of memory-mapped files and let the OS manage it for you
@NPras awesome comment. do you mind if i add it to the answer?
|
0

If I understand correctly, the results are written back in the locations you just read from.

So if you reverse the order of the writes, the first write will be to the same location you last read from.

That will reduce the seek time.

Further, that means the next read will also be contiguous with the other write, again reducing seek time.

Now the main loop over 'i' is obviously long, but I think you could:

  • Break that up into medium size chunks (64M or so might be all you need)
  • Do whole block of reads
  • Do the second block of reads
  • Do an in-memory calculation for both blocks
  • Write them out

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.