17

The JVM optimzes String concatenation with + and replaces it with a StringBuilder. This should be the same in Scala. But what happens if strings are concatenated with ++=?

var x = "x"
x ++= "y"
x ++= "z"

As far as I know this methods treats strings like char seqences, so even if the JVM would create a StringBuilder it would lead to many method calls, right? Would it be better to use a StringBuilder instead?

To what type is the String converted implicitly?

1
  • 2
    Can you unaccept my answer and pick either som-snytt or Rex Kerr? Commented Sep 2, 2014 at 21:24

2 Answers 2

16

There is a huge HUGE difference in time taken.

If you add strings repeatedly using += you do not optimize away the O(n^2) cost of creating incrementally longer strings. So for adding one or two you won't see a difference, but it doesn't scale; by the time you get to adding 100 (short) strings, using a StringBuilder is over 20x faster. (Precise data: 1.3 us vs. 27.1 us to add the string representations of the numbers 0 to 100; timings should be reproducible to about += 5% and of course are for my machine.)

Using ++= on a var String is far far worse yet, because you are then instructing Scala to treat a string as a character-by-character collection which then requires all sorts of wrappers to make the String look like a collection (including boxed character-by-character addition using the generic version of ++!). Now you're 16x slower again on 100 additions! (Precise data: 428.8 us for ++= on a var string instead of +='s 26.7 us.)

If you write a single statement with a bunch of +es then the Scala compiler will use a StringBuilder and end up with an efficient result (Data: 1.8 us on non-constant strings pulled out of an array).

So, if you add strings with anything other than + in line, and you care about efficiency, use a StringBuilder. Definitely don't use ++= to add another String to a var String; there just isn't any reason to do it, and there's a big runtime penalty.

(Note--very often you don't care at all how efficient your string additions are! Don't clutter your code with extra StringBuilders unless you have reason to suspect that this particular code path is getting called a lot.)

Sign up to request clarification or add additional context in comments.

6 Comments

I'm too lazy to time it, but I don't believe ++= is char-by-char; @om-nom-nom is right about "plain old append", since TraversableLike's ++ delegates to the builder. Maybe it didn't always; like before 2.10.
My new favorite Odersky commit message (on TraversableLike): Massive redesign so that: scala> "hi" == "hi".reverse.reverse. That sounds almost paulp-like. (I should have said TL always used the builder's ++=, and StringBuilder overrode it for 2.10.)
@som-snytt - StringBuilder overrides ++= for String only, but here it's typed as a Builder[Char, String] so it uses the generic path--and anyway it is passed an argument which is only known to be a GenTraversableOnce[Char], so it goes through the whole unoptimized path.
@deamon - I did benchmark it. See timings in my post.
OK, thanks. Now I'll have to get out my special glasses and look at it again.
|
13

Actually, the inconvenient truth is StringOps usually remains an allocation:

scala> :pa
// Entering paste mode (ctrl-D to finish)

class Concat {
    var x = "x"
    x ++= "y"
    x ++= "z"
}

// Exiting paste mode, now interpreting.

defined class Concat

scala> :javap -prv Concat
Binary file Concat contains $line3.$read$$iw$$iw$Concat
  Size 1211 bytes
  MD5 checksum 1900522728cbb0ed0b1d3f8b962667ad
  Compiled from "<console>"
public class $line3.$read$$iw$$iw$Concat
  SourceFile: "<console>"
[snip]


  public $line3.$read$$iw$$iw$Concat();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=6, locals=1, args_size=1
         0: aload_0       
         1: invokespecial #19                 // Method java/lang/Object."<init>":()V
         4: aload_0       
         5: ldc           #20                 // String x
         7: putfield      #10                 // Field x:Ljava/lang/String;
        10: aload_0       
        11: new           #22                 // class scala/collection/immutable/StringOps
        14: dup           
        15: getstatic     #28                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
        18: aload_0       
        19: invokevirtual #30                 // Method x:()Ljava/lang/String;
        22: invokevirtual #34                 // Method scala/Predef$.augmentString:(Ljava/lang/String;)Ljava/lang/String;
        25: invokespecial #36                 // Method scala/collection/immutable/StringOps."<init>":(Ljava/lang/String;)V
        28: new           #22                 // class scala/collection/immutable/StringOps
        31: dup           
        32: getstatic     #28                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
        35: ldc           #38                 // String y
        37: invokevirtual #34                 // Method scala/Predef$.augmentString:(Ljava/lang/String;)Ljava/lang/String;
        40: invokespecial #36                 // Method scala/collection/immutable/StringOps."<init>":(Ljava/lang/String;)V
        43: getstatic     #28                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
        46: invokevirtual #42                 // Method scala/Predef$.StringCanBuildFrom:()Lscala/collection/generic/CanBuildFrom;
        49: invokevirtual #46                 // Method scala/collection/immutable/StringOps.$plus$plus:(Lscala/collection/GenTraversableOnce;Lscala/collection/generic/CanBuildFrom;)Ljava/lang/Object;
        52: checkcast     #48                 // class java/lang/String
        55: invokevirtual #50                 // Method x_$eq:(Ljava/lang/String;)V

See more demonstration at this answer.

Edit: To say more, you're building the String on each reassignment, so, no you're not using a single StringBuilder.

However, the optimization is done by javac and not the JIT compiler, so to compare fruits of the same kind:

public class Strcat {
    public String strcat(String s) {
        String t = " hi ";
        String u = " by ";
        return s + t + u;    // OK
    }
    public String strcat2(String s) {
        String t = s + " hi ";
        String u = t + " by ";
        return u;            // bad
    }
}

whereas

$ scala
Welcome to Scala version 2.11.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_11).
Type in expressions to have them evaluated.
Type :help for more information.

scala> :se -Xprint:typer

scala> class K { def f(s: String, t: String, u: String) = s ++ t ++ u }
[[syntax trees at end of                     typer]] // <console>
def f(s: String, t: String, u: String): String = scala.this.Predef.augmentString(scala.this.Predef.augmentString(s).++[Char, String](scala.this.Predef.augmentString(t))(scala.this.Predef.StringCanBuildFrom)).++[Char, String](scala.this.Predef.augmentString(u))(scala.this.Predef.StringCanBuildFrom)

is bad. Or, worse, to unroll Rex's explanation:

  "abc" ++ "def"

  augmentString("abc").++[Char, String](augmentString("def"))(StringCanBuildFrom)

  collection.mutable.StringBuilder.newBuilder ++= new WrappedString(augmentString("def"))

  val b = collection.mutable.StringBuilder.newBuilder
  new WrappedString(augmentString("def")) foreach b.+=

As Rex explained, StringBuilder overrides ++=(String) but not Growable.++=(Traversable[Char]).

In case you've ever wondered what unaugmentString is for:

    28: invokevirtual #40                 // Method scala/Predef$.augmentString:(Ljava/lang/String;)Ljava/lang/String;
    31: invokevirtual #43                 // Method scala/Predef$.unaugmentString:(Ljava/lang/String;)Ljava/lang/String;
    34: invokespecial #46                 // Method scala/collection/immutable/WrappedString."<init>":(Ljava/lang/String;)V

And just to show that you do finally call unadorned +=(Char) but after boxing and unboxing:

  public final scala.collection.mutable.StringBuilder apply(char);
    flags: ACC_PUBLIC, ACC_FINAL
    Code:
      stack=2, locals=2, args_size=2
         0: aload_0       
         1: getfield      #19                 // Field b$1:Lscala/collection/mutable/StringBuilder;
         4: iload_1       
         5: invokevirtual #24                 // Method scala/collection/mutable/StringBuilder.$plus$eq:(C)Lscala/collection/mutable/StringBuilder;
         8: areturn       
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
               0       9     0  this   L$line10/$read$$iw$$iw$$anonfun$1;
               0       9     1     x   C
      LineNumberTable:
        line 9: 0

  public final java.lang.Object apply(java.lang.Object);
    flags: ACC_PUBLIC, ACC_FINAL, ACC_BRIDGE, ACC_SYNTHETIC
    Code:
      stack=2, locals=2, args_size=2
         0: aload_0       
         1: aload_1       
         2: invokestatic  #35                 // Method scala/runtime/BoxesRunTime.unboxToChar:(Ljava/lang/Object;)C
         5: invokevirtual #37                 // Method apply:(C)Lscala/collection/mutable/StringBuilder;
         8: areturn       
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
               0       9     0  this   L$line10/$read$$iw$$iw$$anonfun$1;
               0       9     1    v1   Ljava/lang/Object;
      LineNumberTable:
        line 9: 0

A good laugh does get some oxygen into the bloodstream.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.