3

So a professor in university just told me that using concatenation on strings in C# (i.e. when you use the plus sign operator) creates memory fragmentation, and that I should use string.Format instead.

Now, I've searched a lot in stack overflow and I found a lot of threads about performance, which concatenating strings win hands down. (Some of them include this, this and this)

I can't find someone who talks about memory fragmentation though. I opened .NET's string.Format using ILspy and apparently it uses the same string builder than the string.Concat method does (which if I understand is what the + sign is overloaded to). In fact: it uses the code in string.Concat!

I found this article from 2007 but I doubt it's accurate today (or ever!). Apparently the compiler is smart enough to avoid that today, cause I can't seem to reproduce the issue. Both adding strings with string.format and plus signs end up using the same code internally. As said before, the string.Format uses the same code string.Concat uses.

So now I'm starting to doubt his claim. Is it true?

9
  • 3
    Can't say I've ever heard of this. I think it would at least be reasonable to ask for some evidence. Even if this was true a long time ago, it may not be now. Commented May 10, 2016 at 18:55
  • 2
    I doubt there's any merit to that. Fragmentation comes from allocating and freeing something that both concatenation and formatting do. I would be curious to see his evidence. Commented May 10, 2016 at 18:56
  • 1
    Maybe he's talking about the fact that strings in c# are immutable? Commented May 10, 2016 at 18:58
  • 1
    Even if it were true, it sounds like pre-optimization to me. I think the syntactical niceties of the overloaded + operator will be of greater benefit in the long run. I would only worry about such optimizations after it has been determined that some optimization regarding fragmentation is actually needed in your use case. Commented May 10, 2016 at 19:19
  • 1
    Now you have 2 comments: Jon Skeet, author of C# in depth and Brian Rasmussen, a Program Manager at Microsoft. Commented May 10, 2016 at 19:19

2 Answers 2

22

So a professor in university just told me that using concatenation on strings in C# (i.e. when you use the plus sign operator) creates memory fragmentation, and that I should use string.Format instead.

No, what you should do instead is do user research, set user-focussed real-world performance metrics, and measure the performance of your program against those metrics. When, and only when you find a performance problem, you should use the appropriate profiling tools to determine the cause of the performance issue. If the cause is "memory fragmentation" then address that by identifying the causes of the "fragmentation" and trying experiments to determine what techniques mitigate the effect.

Performance is not achieved by "tips and tricks" like "avoid string concatenation". Performance is achieved by applying engineering discipline to realistic problems.

To address your more specific problem: I have never heard the advice to eschew concatenation in favor of formatting for performance reasons. The advice usually given is to eschew iterated concatenation in favor of builders. Iterated concatenation is quadratic in time and space and creates collection pressure. Builders allocate unnecessary memory but are linear in typical scenarios. Neither creates fragmentation of the managed heap; iterated concatenation tends to produce contiguous blocks of garbage.

The number of times I've had a performance problem that came down to unnecessary fragmentation of a managed heap is exactly one; in an early version of Roslyn we had a pattern where we would allocate a small long lived object, then a small short lived object, then a small long lived object... several hundred thousand times in a row, and the resulting maximally fragmented heap caused user-impacting performance problems on collections; we determined this by careful measurement of the performance in the relevant scenarios, not by ad hoc analysis of the code from our comfortable chairs.

The usual advice is not to avoid fragmentation, but rather to avoid pressure. We found during the design of Roslyn that pressure was far more impactful on GC performance than fragmentation, once our aforementioned allocation pattern problem was fixed.

My advice to you is to either press your professor for an explanation, or to find a professor who has a more disciplined approach to performance metrics.

Now, all that said, you should use formatting instead of concatenation, but not for performance reasons. Rather, for code readability, localizability, and similar stylistic concerns. A format string can be made into a resource, it can be localized, and so on.

Finally, I caution you that if you are putting strings together in order to build something like a SQL query or a block of HTML to be served to a user, then you want to use none of these techniques. These applications of string building have serious security impacts when you get them wrong. Use libraries and tools specifically designed for construction of those objects, rather than rolling your own with strings.

Sign up to request clarification or add additional context in comments.

Comments

0

The problem with string concatenation is that strings are immutable. string1 + string2 does not concatenate string2 onto string1, it creates a whole new string. Using a StringBuilder (or string.Format) does not have this problem. Internally, the StringBuilder holds a char[], which it over-allocates. Appending something to a StringBuilder does not create any new objects unless it runs out of room in the char[] (in which case it over-allocates a new one).

I ran a quick benchmark. I think it proves the point :)

        StringBuilder sb = new StringBuilder();
        string st;
        Stopwatch sw;

        sw = Stopwatch.StartNew();

        for (int i = 0 ; i < 100000 ; i++)
        {
            sb.Append("a");
        }

        st = sb.ToString();

        sw.Stop();
        Debug.WriteLine($"Elapsed: {sw.Elapsed}");

        st = "";

        sw = Stopwatch.StartNew();

        for (int i = 0 ; i < 100000 ; i++)
        {
            st = st + "a";
        }

        sw.Stop();
        Debug.WriteLine($"Elapsed: {sw.Elapsed}");

The console output:

Elapsed: 00:00:00.0011883 (StringBuilder.Append())

Elapsed: 00:00:01.7791839 (+ operator)

10 Comments

But it does not create memory fragmentation.
I'm assuming the term he used is simply not perfectly accurate. However, it can definitely drive the GC nuts, Using string.Format() or a StringBuilder is absolutely the correct advice.
Hi! Thanks for your answer! Now, I don't understand your reasoning. string.Format will also create a third string: the result string. string.Format does use a string builder yes, but string.Concat doesn't do anything different. It allocates a single string no matter how many strings you add and it fills it with the data, so I don't see the memory fragmenation problem there =(.
string1 + string2 results in a new string object created, of size string1.Length + string2.Length; using a StringBuilder results in at least three new objects - the StringBuilder, the char[], and the final string when you extract it. How is this better? Performing a lot of concatenations would cause some fragmentation, but depending on the string size, the char[] will still have to be re-allocated repeatedly; StringBuilder is faster, String.Format is better for special-purpose formatting, but citing fragmentation alone as a reason doesn't seem to make sense.
Your benchmark has numerous flaws. Issue one: suppose the first loop causes collection pressure, but not enough to cause a collection. Suppose the second loop causes a collection. The collection cost associated with the first loop has now been charged to the second loop. Issue two: suppose the second loop causes collection pressure, but the program ends before enough pressure has built up to cause a collection. The performance impact of the pressure of the second loop is charged to no one. I could go on in this vein for some time.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.