I am writing a program which is parsing large, non-predictable files. No problem with this part. I have been using the code below, looping through ReadLine until the end of the document to keep the memory footprint low. My problem being is an OutOfMemoryException when a line is simply too long.
System.IO.StreamReader casereader = new System.IO.StreamReader(dumplocation);
string line;
while ((line = casereader.ReadLine()) != null)
{
foreach (Match m in linkParser.Matches(line))
{
Console.Write(displaytext);
Console.WriteLine(m.Value);
XMLWrite.Start(m.Value, displaytext, dumplocation, line);
}
}
XMLWrite is just writing any strings that match my Regex Function to an XML Document. The Regex function is a simple email search. The issue occurs when ReadLine is called and the application finds an extremely long line in the file I am reading(I can see this as the memory usage in task manger climbs and climbs as it populates the string 'line'). Eventually it runs out of memory and crashes. What I want to do is read pre defined blocks (e.g 8,000 characters) and then run these one at a time through the same process. This means that I will then always know the length of string line (8,000 chars) and should not receive and out of memory exception. Does my logic seem logic!? I am looking for the best way to implement ReadBlock as currently I am unable to get it working.
Any help much appreciated!