4

Strings in Java support structural sharing for some methods like substring, which means that supposedly immutable data doesn't need to be copied (which (unexpectedly) keeps large char arrays alive which would have been GC'd otherwise.)

This feature is implemented with two fields offset and count which are set accordingly when a String is substringed in Java.

Considering that .NET doesn't do this and claims that "O(n) is O(1) if n does not grow large", would a slightly different design of Strings make sense which accommodates both requirements?

E. g. would it make sense to have a sealed, memory-efficient, general purpose version of String which doesn't have these superfluous fields and a subclass "SubString" which is only returned by substring methods and has the additional fields to avoid copying?

Rough sketch:

sealed class String {
  val codeunits: Array[Char] = ...
  def length = codeunits.length

  def substring: SubString = ...

  ...
}

final class SubString extends String {
  val offset: Int = ...
  override def length = codeunits.length - offset /* and so on */

  ...
}
11
  • The field is called count, not length, at least in the Oracle JVM. Commented Sep 16, 2011 at 14:35
  • I think you want to limit tags to java/scala since C# as you mention does not have the thing you propose to work around/replace Commented Sep 16, 2011 at 14:36
  • @sehe: I'm especially interested what experiences .NET devs have made with their choice, that's why I left the tag in. Commented Sep 16, 2011 at 14:38
  • In theory it should be possible for a sufficiently-smart JVM to automatically shrink the array behind your back, but it might not be practical or worthwhile. Commented Sep 16, 2011 at 14:38
  • 1
    Are there any good benchmarks showing how each model stacks up under similar load in terms of memory and processor usage? Seems like that would be your starting place to see if there's enough difference to warrant a mixed model. It might also show where each model ought to be used. (Keep in mind Java currently allows both models, you just have to choose which one to use, and I'm not so sure it's wasting as many bytes as you think. Length is already required by C# strings and there's going to be a starting memory address for any string) Commented Sep 16, 2011 at 14:51

1 Answer 1

2

What you suggest could make the common case more efficient in terms of memory and cpu.

You may be interested to know the JVM can change this without a code change. The Sun/Oracle JVM currently uses a byte[] automagically when the characters fit into bytes without loss.

In any case its the sort of thing you would want the JVM to do for you transparently, like -XX:+UseCompressedStrings does.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, but isn't that something completely different? Having a byte[] instead of a char[] doesn't change the behavior in question ...
It shows that a change for performance reasons is possible, has happened relatively recently, and even desirable provided its transparent to the Java application.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.