1
for(int i=0; i<len ;i++ ){
   Set<Integer> fileTerm = new HashSet<Integer>();

   ....

}

this set will be huge for each iteration

next way is that we put the creation function outside the loop and clear every time

  Set<Integer> fileTerm = new HashSet<Integer>();

for(int i=0; i<len ;i++ ){

   ....
   fileTerm.clear();
}
2
  • I wouldn't expect a significant difference. Commented Apr 16, 2014 at 18:02
  • I think the bottleneck will be in the resizing of the Set as you are adding new items and exceeding the default size. If you know the size you need you should initialize the capacity (new HashSet<>(2000)), this will save time and memory (copying and rehashing of the Set). Still the best way to decide is to test the performance both ways with your data. I would say the 2nd approach will be faster, but the 1st is easier to read. Commented Apr 16, 2014 at 18:30

3 Answers 3

1

The key difference between creating a new set and reusing the old one by clearing is that clearing does not reduce the hashtable capacity back to the initial setting. In your case this is probably a good thing, but the savings are minimal. You probably do enough work in that loop that this will be unnoticeable.

On the other hand, creating a new set each time makes your code more robust and easier to reason about. If you ever introduce continue statement, forgetting to clear() before it, you get a broken program.

Sign up to request clarification or add additional context in comments.

1 Comment

I see, so the second way is safer!
1

I made a simple test (no warm up, no real data, just a small demonstration):

int sum = 0;
long start = System.currentTimeMillis();
//  Set<Integer> set = new HashSet<Integer>(); // (2)
for (int i = 0; i < 100_000; i++) {
    Set<Integer> set = new HashSet<Integer>(); // (1)
//  Set<Integer> set = new HashSet<Integer>(5_000); // (3)
    for (int j = 0; j < 5_000; j++) {
        set.add(j);
    }
    sum += set.contains(78285) ? 1 : 0;
    sum += set.contains(85) ? 1 : 0;
//  set.clear(); // (2)
    }
    System.out.println((System.currentTimeMillis() - start) + "ms");
    System.out.println(sum);

Times in seconds (JDK 1.7.0_25 32bit)

(1) 24 23.9 24 - your 1st option

(2) 18.8 18.6 18.7 - your 2nd option

(3) 18.4 18.4 18.3 - set the initial capacity to 5000

2 Comments

What is the third line mean? Thanks!
I added it to my answer.
0

In my opinion, the second way is more better than the first. Because in the second way, created Set object only once but in the first way create Set object in every looping. That why second is more better.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.