0

I want to write a fast multi thread program using c# that read a file.

so the file must be split into some parts and each part process in different thread. for ex:

Line1
Line2
Line3
Line4

must split to 4 lines like this:

Line1 => thread 1
Line2 => thread 2
Line3 => thread 3
Line4 = > thread 4

i used the StreamReader.readLine() but it cant read specify line.

Comment: its necessary to speedup the program so i want to read file in separate threads.

3
  • 4
    Why do you think a multi threading will help with this? Commented Oct 19, 2011 at 20:08
  • What makes you think it will be faster with several threads? I/O bound tasks are usually not good candidates for parallelization... Commented Oct 19, 2011 at 20:08
  • You won't see a performance increase. Your disk can only serve bits at a particular speed, and your computer is not faster than your disk. Basically, you seem to think that the bottleneck is your software, when it's most likely the Disk I/O. Commented Oct 19, 2011 at 20:10

6 Answers 6

5

Unless you're using fixed-length lines, this isn't possible.

Why? Because in order to determine where the "lines" split, you need to find the newline characters... which means you need to read the file first.

Now, if you simply want to perform some extra "processing" after you read in each line - that is possible and relatively straight-forward using a ThreadPool.

Sign up to request clarification or add additional context in comments.

2 Comments

not the number of lines... the length of each line. Is that fixed?
If the length of each line is fixed, you can use a FileStream and set the position to the required offset. However, I really don't think this will help you - as the bottleneck is almost definitely disk speed... so more threads != faster reading from disk.
5

You should read the file in a single thread - but then spawn the processing of each line to a different thread, e.g. by adding it to a producer/consumer queue.

Even if you could seek to a specific line in a text file (which in general you can't) you really don't want the disk thrashing around - that'll only slow things down. The fastest way to get the data off the disk is to read it sequentially. By all means defer everything about handling the line beyond "decoding the binary data to text" to other threads, but you really don't want the IO to be in multiple threads.

Comments

3

AFAIK .NET doesn't support parallel stream reading. If you want to process every line you may use File.ReadAllLines. It returns an array of strings. Then use you can use PLINQ.

var result = File.ReadAllLine("path")
   .AsParallel()
   .Select(s => DoSthWithString(s))
   .ToList();

Comments

2

You're not going to be able to speed up the actual reading because you're going to have tremendous locking issues keeping everything straight.

Since a text file is an unstructured file, ie. each line can be of different length, you have no choice but to read each line after the other, one by one.

Now, what you can do is process those lines on different threads, but the actual reading, keep that to one thread.

But, before you do that, are you sure you even have to do this? Is this a bottleneck? If not, fix the bottleneck first and see how far you get.

Comments

1

Your StreamReader is connected to a stream class. Using the stream class you can .Seek to a particular byte location.

Like others have said, this probably isn't a good idea, but it can be done.

Comments

1

I would split the file before hand. Say the file is 1000 lines. Split it into 10 files of 100 lines. Have a thread process each file.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.