String.substring() making a copy of the underlying char[] value [closed]

Question

Closed. This question is opinion-based. It is not currently accepting answers.

Want to improve this question? Because this question may lead to opinionated discussion, debate, and answers, it has been closed. You may edit the question if you feel you can improve it so that it requires answers that include facts and citations or a detailed explanation of the proposed solution. If edited, the question will be reviewed and might be reopened.

Closed 10 years ago.

Improve this question

A question relating to performance considerations for String.substring. Prior to Java 1.7.0_06, the String.substring() method returned a new String object that shared the same underlying char array as its parents but with different offset and length. To avoid keeping a very large string in memory when only a small substring was needed to be kept, programmers used to write code like this:

s = new String(queryReturningHugeHugeString().substring(0,3));

From 1.7.0_06 onwards, it has not been necessary to create a new String because in Oracle's implementation of String, substrings no longer share their underlying char array.

My question is: can we rely on Oracle (and other vendors) not going back to char[] sharing in some future release, and simply do s = s.substr(...), or should we explicitly create a new String just in case some future release of the JRE starts using a sharing implementation again?

Not an answer exactly, but a very good answer over here stackoverflow.com/a/20275133/2796832 may help. Perhaps if you really need to you could use the getValueLength from that answer and then use that to flag your code. — Jonah Graham
– Jonah Graham, Commented Nov 24, 2015 at 12:31
@JonahGraham, bad advice. This may break already in Java-9: it's expected that char[] array will be replaced with byte[] there. Accessing private JDK fields via reflection is not a good idea in general. — Tagir Valeev
– Tagir Valeev, Commented Nov 24, 2015 at 12:39
@TagirValeev TBH I had been contemplating putting that comment as an answer, but it made me feel uncomfortable. I am sure the OP did detailed analysis to ensure that the extra complication in their code was not a premature optimization. However, I am not sure ever future reader would take the same care and attention. Anyway, your answer is a good one +1 — Jonah Graham
– Jonah Graham, Commented Nov 24, 2015 at 13:45

Community · Accepted Answer · 2017-05-23 12:15:10Z

9

The actual representation of the String is an internal implementation detail, so you can never be sure. However according to public talks of Oracle engineers (most notably @shipilev) it's very unlikely that it will be changed back. This was done not only to fight with possible memory leak, but also to simplify the String internals. With simpler strings it's easier to implement many optimization techniques like String deduplication or Compact Strings.

edited May 23, 2017 at 12:15

CommunityBot

11 silver badge

answered Nov 24, 2015 at 12:38

Tagir Valeev

101k19 gold badges233 silver badges346 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Holger Over a year ago

Actually, the existence of String de-duplication also shows that we don’t need to ever worry about substring representation anyway. If a JVM’s garbage collector is capable of patching strings to let equal instances share the same array, it is not too far fetched to assume that it would be capable of patching substrings as well if a JVM vendor ever decides to go back to the shared substring (offset+length) representation.

Collectives™ on Stack Overflow

String.substring() making a copy of the underlying char[] value [closed]

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related