Background:
I am downloading a large (>500mb) text file which contains lots of SQL statements which I need to run across a database. To achieve this, I am processing the file line by line until I find a full query and then execute it. When running this application, the logic inside the while loop uses more memory than anticipated.
I have removed the code which is running the query against the database - after debugging it doesn't seem the be what is causing the issue.
Code:
Below is some example code to demonstrate - obviously this is not the full program, but this is where I've narrowed the problem down to. Note that sr is a StreamReader which has been initialised to read from my MemoryStream.
StringBuilder query = new StringBuilder();
while (!sr.EndOfStream)
{
query.AppendLine(await sr.ReadLineAsync());
string currentQueryString = query.ToString();
if (currentQueryString.EndsWith($"{Environment.NewLine}GO{Environment.NewLine}"))
{
// Run query against database
// Clean up StringBuilder so it can be used again
query = new StringBuilder();
currentQueryString = "";
}
}
For this example, let's say that every new line in the file could be between 1 and 300 characters long. Also 99% of the queries are INSERT statements containing 1,000 records (each record on a new line).
When I run the application:
I can see in my Windows Task Manager that as the application runs, the memory allocated to the app increases what looks like almost every iteration of the while loop. I placed a break point on currentQueryString = ""; and every time it gets hit (knowing that I've just read another 1,000 lines of the file into memory) I can see that memory used by the application increases (this time from using the Diagnostic Tools inside Visual Studio) anywhere from 100mb to 200mb roughly, however from taking snapshots each time the breakpoint is hit I can see that the Heap Size is barely changing, maybe a few hundred kb either way.
What I think is causing the issue:
My best guess at the moment is that the string currentQueryString = query.ToString(); line is somehow initialising a variable possibly in unmanaged memory which is not being released. One reason for this is I tested with the following code which removes calling toString() on the StringBuilder and the memory usage is drastically lower as it only increases by about 1-2mb or so for every 1,000 lines processed:
while (timer.Elapsed.TotalMinutes < 14 && !sr.EndOfStream && !killSwitch)
{
query.AppendLine(await sr.ReadLineAsync());
currentExeQueryCounter += 1;
if (currentExeQueryCounter > 1000)
{
query = new StringBuilder();
currentExeQueryCounter = 0;
}
}
For debugging purposes only I added in GC.Collect() underneath currentQueryString = ""; in the first code snippet which completely resolved the issue (observed in both Visual Studio Diagnostic Tools and Task Manager) and I am trying to understand why this is and how I can best address this issue as I aim to run this as a serverless application which will be allocated a limited amount of memory.
GC.Collect()"solves" the issue. There was not problem..new StringBuilderwhy notquery.Clear()query.Clear()I was actually using this previously but saw another post suggesting to try out setting it tonullinstead. When I did this I was honestly just clutching at straws - it didn't really make a difference for me in the end and I just forgot to change it back. I have resorted back to usingquery.Clear()now though