2

Background:

I am downloading a large (>500mb) text file which contains lots of SQL statements which I need to run across a database. To achieve this, I am processing the file line by line until I find a full query and then execute it. When running this application, the logic inside the while loop uses more memory than anticipated.

I have removed the code which is running the query against the database - after debugging it doesn't seem the be what is causing the issue.

Code:

Below is some example code to demonstrate - obviously this is not the full program, but this is where I've narrowed the problem down to. Note that sr is a StreamReader which has been initialised to read from my MemoryStream.

StringBuilder query = new StringBuilder();

while (!sr.EndOfStream)
{
    query.AppendLine(await sr.ReadLineAsync());

    string currentQueryString = query.ToString();

    if (currentQueryString.EndsWith($"{Environment.NewLine}GO{Environment.NewLine}"))
    {
        // Run query against database

        // Clean up StringBuilder so it can be used again
        query = new StringBuilder();
        currentQueryString = "";
    }
}

For this example, let's say that every new line in the file could be between 1 and 300 characters long. Also 99% of the queries are INSERT statements containing 1,000 records (each record on a new line).

When I run the application:

I can see in my Windows Task Manager that as the application runs, the memory allocated to the app increases what looks like almost every iteration of the while loop. I placed a break point on currentQueryString = ""; and every time it gets hit (knowing that I've just read another 1,000 lines of the file into memory) I can see that memory used by the application increases (this time from using the Diagnostic Tools inside Visual Studio) anywhere from 100mb to 200mb roughly, however from taking snapshots each time the breakpoint is hit I can see that the Heap Size is barely changing, maybe a few hundred kb either way.

What I think is causing the issue:

My best guess at the moment is that the string currentQueryString = query.ToString(); line is somehow initialising a variable possibly in unmanaged memory which is not being released. One reason for this is I tested with the following code which removes calling toString() on the StringBuilder and the memory usage is drastically lower as it only increases by about 1-2mb or so for every 1,000 lines processed:

while (timer.Elapsed.TotalMinutes < 14 && !sr.EndOfStream && !killSwitch)
{
    query.AppendLine(await sr.ReadLineAsync());

    currentExeQueryCounter += 1;

    if (currentExeQueryCounter > 1000)
    {
        query = new StringBuilder();
        currentExeQueryCounter = 0;
    }
}

For debugging purposes only I added in GC.Collect() underneath currentQueryString = ""; in the first code snippet which completely resolved the issue (observed in both Visual Studio Diagnostic Tools and Task Manager) and I am trying to understand why this is and how I can best address this issue as I aim to run this as a serverless application which will be allocated a limited amount of memory.

4
  • 2
    If GC.Collect() "solves" the issue. There was not problem.. Commented Nov 9, 2021 at 7:37
  • Let's start with the basics. Do you have an actual problem, or do you just have an observation that it uses more memory than you thought it would? The latter is not an actual problem unless it causes crashes, slowdowns, etc. Commented Nov 9, 2021 at 8:38
  • Instead of new StringBuilder why not query.Clear() Commented Nov 9, 2021 at 10:02
  • Regarding the comments on using query.Clear() I was actually using this previously but saw another post suggesting to try out setting it to null instead. When I did this I was honestly just clutching at straws - it didn't really make a difference for me in the end and I just forgot to change it back. I have resorted back to using query.Clear() now though Commented Nov 11, 2021 at 1:56

3 Answers 3

4

To add to the very sensible answer from JonasH: The call to query.ToString() is probably costing you quite a bit. It is also an unnecessarily convoluted way to check for a line that says "GO". If you instead just compare the most recently read line to "GO", you can cut down on the ToString() calls. For example, something like this:

string line = await sr.ReadLineAsync();
query.AppendLine(line);

if (line == "GO")
{
    string currentQueryString = query.ToString();

    // Run query against database

    query.Clear(); // Clean up StringBuilder so it can be used again
}
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Petter, It seems query.ToString() was indeed costing me a lot and your answer was the most impactful and helpful in solving my issue!
4

Just increasing memory usage does not indicate a memory leak, the Garbage collector will run according to its own rules, for example when there is not sufficient memory. If inserting a GC.Collect resolves it there was probably never a leak to begin with.

Whenever there is a potential memory problem I would recommend using a memory profiler. This should allow you to trigger GCs at will, collect snapshots of all the allocated objects, and compare them to see if some kind of object count is steadily increasing.

That said, I would suggest changing query = new StringBuilder(); to query.Clear(). No need to re-allocate a bunch of memory when you already have a buffer available.

You could perhaps further reduce allocation rate by using Span<char>/Memory<char> as much as possible rather than strings. This should let you refer to a specific sequence of characters within a larger buffer without doing any copying or allocation at all. This was a major reason for Span<>, since copying strings is a bit inefficient when doing lots of xml/json deserialization & html processing.

Comments

1

If you draw on a piece of paper how memory allocations occure for some execution cycles:

1. query sb initial allocation
2. ReadLineAsync retval allocation
3. ReadLineAsync retval marked free
4. currentQueryString allocation
5. $"..." allocation (maybe compiler optimize this out from cycle)
6. new query sb initial allocation
7. "old" query sb marked free
8. new currentQueryString allocation
9. "old" currentQueryString marked free
99. (here starts the Nth cycle)
100. new ReadLineAsync retval allocation
101. at some point query sb preallocation become too small, so here comes a bigger query sb allocation and the "old" area marked free
102. new ReadLineAsync retval marked free
103. new currentQueryString allocation
104. $"..." allocation (maybe compiler optimize this out from cycle)
105. new query sb initial allocation
106. "old" query sb marked free
107. new currentQueryString allocation
108. "old" currentQueryString marked free
(etc)

Memory allocation logic will each time look for next available free block large enough to hold the requested size. If it finds no such big area it will ask from OS a new block. If it cannot get then it will call GC, which will really free all blocks marked free and if there is still no big enough block it will rearrange the occupied blocks to merge free pieces. GC won't merge blocks on Large Object Heap where allocations bigger than 88kb (as I remember) occure. These GC steps are hitting your performance.

So if the size of allocations occuring during steps 1-108 have a growing pattern (sb allocation is a potential suspect) then you will see a continously growing memory usage (because eating memory much faster than making GC steps above). If this growth results in moving to Large Object Heap then you risk getting OutOfMemoryException at some point. If you take a dump of this situation you may see gigabytes of free inprocess memory but still receiving OOM! (this happened to me)

This is just a technical explanation, other's solutions are correct. Always try to reuse already allocated memory, especially if you do something in cycles , especially if those cycles have undefined repeat count:

  • use ReadLineAsync retval for "GO" check instead of a new currentQueryString allocation
  • don't count on compiler optimizes $"..." string out of your cycle.
  • preallocate StringBuilder large enough for longest expected content
  • reuse StringBuilder (clear it) instead of instantiating new one

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.