0

I want to write 10^5 lines of 10^5 randomly generated numbers to a file, so that each line contains 10^5 numbers. Therefore I wanted to know what the best approach would be for doing this quickly. I thought of creating 10^5 threads that are launched concurrently and each of them writes one line, so that the file is filled in the time it takes to write only 1 line.

public static void GenerateNumbers(string path)
    {
        using(StreamWriter sw = new StreamWriter(path))
        {
            for (int i = 0; i < 100000; i++)
            {
                for (int j = 0; j < 100000; j++)
                {
                    Random rnd = new Random();
                    int number = rnd.Next(1, 101);
                    sw.Write(number + " ");
                }
                sw.Write('\n');
            }
        }
    }

Currently I am doing it like this, is there a faster way?

3
  • 2
    Your number generator is CPU bound, and is going to vastly out-pace file access, which is IO bound. Having 5 threads talking to one file is: not going to help; plus: it doesn't work like that. Commented Apr 17, 2022 at 9:07
  • Ok, so what is the fastest way to achieve what I am trying to do ? Commented Apr 17, 2022 at 9:31
  • 2
    You can make your code snippet a little quicker by instantiating rnd just once, instead of for every number. Commented Apr 17, 2022 at 10:11

1 Answer 1

1

Now that there's a code snippet, some optimization can be applied.

static void Main(string[] args)
{
    var sw = new Stopwatch();
    const int pow = 5;
    sw.Start();
    GenerateNumbers("test.txt", pow);
    sw.Stop();
    Console.WriteLine($"Wrote 10^{pow} lines of 10^{pow} numbers in {sw.Elapsed}");
}

public static void GenerateNumbers(string path, int pow)
{
    var rnd = new Random();
    using var sw = new StreamWriter(path, false);
    var max = Math.Pow(10, pow);
    var sb = new StringBuilder();
    for (long i = 0; i < max; i++)
    {
        for (long j = 0; j < max; j++)
        {
            sb.Append(rnd.Next(1, 101));
            sb.Append(' ');
        }
        sw.WriteLine(sb.ToString());
        sb.Clear();
        if (i % 100 == 0)
            Console.WriteLine((i / max).ToString("P"));
    }
}

The above code does IO writes at a fairly decent pace (remember the limit is the IO speed, not CPU / number generation). Also note that I'm running the code from inside a VM, so I'm likely not getting the best IO results.

Resource Monitor

  • As mentioned by Neil Moss in the comments, you don't need to instantiate the Random class on each run.
  • I'm generating a single line to write in-memory using a StringBuilder, then I write this to the disk.
  • Since this does take a bit of time I've added a progress tracker (this adds a miniscule amount of overhead).
  • A 10^4 lines of 10^4 numbers file already is 285MB in size and was generated in 4.6767592 seconds.
  • A 10^5 case like the above yields a 25.5 GB file and takes 5:54.2580683 to generate.

I haven't tried this, but I'm wondering if you couldn't save time by writing the data to a ZIP file, assuming you're more interested in just getting the data onto the disk, and not the format itself. A compressed TXT file of numbers should be a fair-bit smaller and as such should be much faster to write.

Sign up to request clarification or add additional context in comments.

4 Comments

there are 10^5 lines each line with 10^5 random numbers
I have updated the question with a code snippet of how I am currently writing numbers
What is a reasonable amount of threads running simultaneously, let's call it N? Could I launch N number of threads wait for them to finish and repeat this 10^5/N times
@IOEnthusiast I've misread the question and assumed there's less data to write. I'll modify my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.