2

Based on the discussion about getting substring of String Java String.split memory leak? , I have been analyzing two sample substring examples of usage.

It is said that objects don't get garbage collected if the caller stores a substring of a field in the object. When I run the code I get and OutofMemory Exception, and see the incresing of char[] allocated size while monitoring it via VisualVM

public class TestGC {
    private String largeString = new String(new byte[100000]);    
    String getString() {
        return this.largeString.substring(0,2);     
        //return new String(this.largeString.substring(0,2));
    }

    public static void main(String[] args) {
        java.util.ArrayList<String> list = new java.util.ArrayList<String>();
        for (int i = 0; i < 100000; i++) {
            TestGC gc = new TestGC();            
            list.add(gc.getString());            
        }
    }
}

with the following code, I did not get an error and after analyzing memory usage via VisualVM I realized that allocated char[] size getting increasing then somehow decreased at some point , then increasing again and decreased at some point (GC works its job). And It continues forever.

public class TestGC {
    private String largeString = new String(new byte[100000]);

    String getString() {
        //return this.largeString.substring(0,2);       
        return new String(this.largeString.substring(0,2));
    }

    public static void main(String[] args) {
        java.util.ArrayList<String> list = new java.util.ArrayList<String>();
        for (int i = 0; i < 100000; i++) {
            TestGC gc = new TestGC();            
            list.add(gc.getString());            
        }
    }
}

I really want to understand what does GC collect then remove from heap memory in second example? Why GC cannot collect same object in the first example?

at the first example largeString.substring(0,2)); send a reference and at the second example new String(this.largeString.substring(0,2)); creates new objects. Both cases should not problem for behaviour of GC?

6
  • 9
    substring has changed since Java7, what Java version are you using? Commented Nov 19, 2013 at 20:48
  • Version does matter. Please read java-performance.info/changes-to-string-java-1-7-0_06 Commented Nov 19, 2013 at 20:49
  • As I said that I have been following the discussion about String substring, so read topics you adviced me to read. However I want to know the GC behaviors on two cases. Commented Nov 19, 2013 at 21:00
  • @JustinKSU: your link claims that nothing has changed regarding the hashing in Java 7 as of November 19, 2013. But none of the classes listed in that blog that are supposed to have the hashing problem contain the mentioned code. Neither in 1.6.0._45 nor in 1.7.0_45 (and the mentioned 1.7.0_25 is quite old anyway) Commented Nov 19, 2013 at 21:00
  • @a_horse_with_no_name it also has a section about substring() and garbage collection changes. See the "Sharing an underlining char[]" section. Commented Nov 19, 2013 at 21:20

4 Answers 4

3

In the first example, every time around the loop when you create a new TestGC object you are also creating a new String initialised from the 100000 byte array. When you call String.substring you are returning the same big long string but with the offset set to 0 and count set to 2. So all the data is still in memory but when you use the String you will only see the 2 characters specified in the substring call.

In the second example you are again creating the new String every time around the loop, but by calling new String(String.substring) you are discarding the rest of the String and only keeping the 2 characters in memory, so the rest can be garbage collected.

As the links in the comments say, this behaviour has changed in 1.7.0_06 so that the String returned by String.substring will no longer share the same char[].

Sign up to request clarification or add additional context in comments.

Comments

2

I wouldn't expect the behaviour that you've described in Java 7, because substrings are now handled completely differently. However ...

In Java 6

In the first example, the substring that you're storing in your list uses the same character array as the original String inside the TestGC object, so that character array can't get returned to the heap.

In the second example, a new String is allocated with its own character array when you do the copy, so the original String can be returned to the heap when the TestGC goes out of scope. So you don't get 100000 bytes leaking on every iteration through the loop.

Comments

0

The new String() explicit constructor call creates a new String instance with a copy of the relevant part of the char[] (as opposed to the first example where the underlying huge char[] is shared). So, in your second example, the huge String gets allocated in each loop, but discarded after the TestGC instance is discarded at the end of the loop.

1 Comment

The huge String won't be discarded after the constructor completes but after the loop when the reference to the TestGC object goes out of scope.
0

My understanding from all answers and comments especially from David Wallace and DaveJohnston.

Here is the first example's references among objects representation enter image description here

Here is the second example's references among objects representation enter image description here

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.