0

I am writing some code to learn new c# async design patterns. So I thought writing a small windows forms program that counts lines and words of text files and display the reading progress.

Avoiding disk swapping, I read files into a MemoryStream and then build a StreamReader to read text by lines and count.

The issue is I can`t update the progressbar right. I read a file but always there are bytes missing, so the progressbar doesn't fill entirely.

Need a hand or a idea to achieve this. Thanks

public async Task Processfile(string fname)
{

  MemoryStream m;
  fname.File2MemoryStream(out m); // custom extension which read file into
                                  // MemoryStream

  int flen = (int)m.Length;       // store File length
  string line = string.Empty;     // used later to read lines from streamreader


  int linelen = 0;                // store current line bytes
  int readed = 0;                 // total bytes read


    progressBar1.Minimum = 0;     // progressbar bound to winforms ui
    progressBar1.Maximum = flen;

    using (StreamReader sr = new StreamReader(m)) // build streamreader from ms
    {

       while ( ! sr.EndOfStream ) // tried ( line = await sr.ReadLineAsync() ) != null
       {

          line = await sr.ReadLineAsync();

            await Task.Run(() =>
            {

              linelen = Encoding.UTF8.GetBytes(line).Length;  // get & update
              readed += linelen;                              // bytes read

                                                         // custom function
              Report(new Tuple<int, int>(flen, readed)); // implements Iprogress
                                                         // to feed progress bar

             });                     
         }
     }

        m.Close();    //  releases MemoryStream
        m = null;            
 }
4
  • Using a MemoryStream probably doesn't do any good; both FileStream and StreamReader are intelligently buffered. Commented Jul 24, 2013 at 17:20
  • @SLaks Yes, those classes are buffered but I read whole archive into a MemoryStream for perfomance. I think reading files in chunks is slower because need more I/O. Commented Jul 24, 2013 at 17:27
  • @ppk - unless you're doing any Seeking on the stream, reading the whole file into a MemoryStream is idiotic, if all you want to do is process the file sequentially. The Operating System is already reading the file into memory intelligently. Or to put it another way, don't try to optimize without measuring. At the moment, you just seem to have guessed that putting it into a MemoryStream will (somehow) yield better performance. Commented Jul 24, 2013 at 17:42
  • Thanks to all for help. Commented Jul 24, 2013 at 18:03

1 Answer 1

4

The total length being assigned to flen includes the carriage returns of each line. The ReadLineAsync() function returns a string that does not include the carriage return. My guess is that the amount of missing bytes in your progress bar is directly proportional to the amount of carriage returns in the file being read.

Sign up to request clarification or add additional context in comments.

2 Comments

Encoding.Getbytes documentation doesn't say anything about not including carriage returns. However I am going to test it.
@ppk - it's not Encoding.GetBytes that's removing the carriage returns - it's you calling a function (ReadLineAsync) that will never return a carriage return character

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.