1

Using strings as String objects is pretty convenient for many string processing tasks.

I need extract some substrings to process and scala String class provide me with such functionality. But it is rather expensive: new String object is created every time substring function is used. Using tuples (string : String, start : Int, stop : Int) solves the performance problem, but makes code much complicated.

Is there any library for creating string proxys, that stores original string, range bound and is compatibles with other string functions?

4
  • 3
    Do you have benchmarks that show strings being slower than tuples, or are you just guessing? Commented Sep 16, 2011 at 11:18
  • just guessing. Tuples means other ways for proccessing: char by char iteration. It would be faster than copying a string to proccess it. But convenient functions such as startsWith should be implemented by hand Commented Sep 16, 2011 at 11:23
  • You say a new String object is created every time substring function is used. What makes you say that? Because in general (Sun's Hotspot VM for example) this isn't the case. Commented Sep 16, 2011 at 11:31
  • 2
    Strictly speaking, a new string object is created every time the substring method is called. However, the substring and the original string share the same character array. Commented Sep 16, 2011 at 11:35

2 Answers 2

10

Java 7u6 and later now implement #substring as a copy, not a view, making this answer obsolete.


If you're running your Scala program on the Sun/Oracle JVM, you shouldn't need to perform this optimization, because java.lang.String already does it for you.

A string is stored as a reference to a char array, together with an offset and a length. Substrings share the same underlying array, but with a different offset and/or length.

Sign up to request clarification or add additional context in comments.

2 Comments

We actually had this behaviour bite us not long ago. One of our apps was using more memory than we expected. Turns out we had a collection of short Strings, which had been obtained from the substring of an extremely large String, and it was holding on to the huge byte array from the original String. new String(str) fixed that easily, but it was an interesting thing to trip up on.
It's worth reading the explanation of why .NET doesn't perform the same substring optimization.
5

Look at the implementation of String (in particular substring(int beginIndex, int endIndex)): it's already represented as you wish.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.