15

A question relating to performance considerations for String.substring. Prior to Java 1.7.0_06, the String.substring() method returned a new String object that shared the same underlying char array as its parents but with different offset and length. To avoid keeping a very large string in memory when only a small substring was needed to be kept, programmers used to write code like this:

s = new String(queryReturningHugeHugeString().substring(0,3));

From 1.7.0_06 onwards, it has not been necessary to create a new String because in Oracle's implementation of String, substrings no longer share their underlying char array.

My question is: can we rely on Oracle (and other vendors) not going back to char[] sharing in some future release, and simply do s = s.substr(...), or should we explicitly create a new String just in case some future release of the JRE starts using a sharing implementation again?

3
  • 1
    Not an answer exactly, but a very good answer over here stackoverflow.com/a/20275133/2796832 may help. Perhaps if you really need to you could use the getValueLength from that answer and then use that to flag your code. Commented Nov 24, 2015 at 12:31
  • 2
    @JonahGraham, bad advice. This may break already in Java-9: it's expected that char[] array will be replaced with byte[] there. Accessing private JDK fields via reflection is not a good idea in general. Commented Nov 24, 2015 at 12:39
  • @TagirValeev TBH I had been contemplating putting that comment as an answer, but it made me feel uncomfortable. I am sure the OP did detailed analysis to ensure that the extra complication in their code was not a premature optimization. However, I am not sure ever future reader would take the same care and attention. Anyway, your answer is a good one +1 Commented Nov 24, 2015 at 13:45

1 Answer 1

9

The actual representation of the String is an internal implementation detail, so you can never be sure. However according to public talks of Oracle engineers (most notably @shipilev) it's very unlikely that it will be changed back. This was done not only to fight with possible memory leak, but also to simplify the String internals. With simpler strings it's easier to implement many optimization techniques like String deduplication or Compact Strings.

Sign up to request clarification or add additional context in comments.

1 Comment

Actually, the existence of String de-duplication also shows that we don’t need to ever worry about substring representation anyway. If a JVM’s garbage collector is capable of patching strings to let equal instances share the same array, it is not too far fetched to assume that it would be capable of patching substrings as well if a JVM vendor ever decides to go back to the shared substring (offset+length) representation.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.