I made a simple test (no warm up, no real data, just a small demonstration):
int sum = 0;
long start = System.currentTimeMillis();
// Set<Integer> set = new HashSet<Integer>(); // (2)
for (int i = 0; i < 100_000; i++) {
Set<Integer> set = new HashSet<Integer>(); // (1)
// Set<Integer> set = new HashSet<Integer>(5_000); // (3)
for (int j = 0; j < 5_000; j++) {
set.add(j);
}
sum += set.contains(78285) ? 1 : 0;
sum += set.contains(85) ? 1 : 0;
// set.clear(); // (2)
}
System.out.println((System.currentTimeMillis() - start) + "ms");
System.out.println(sum);
Times in seconds (JDK 1.7.0_25 32bit)
(1) 24 23.9 24 - your 1st option
(2) 18.8 18.6 18.7 - your 2nd option
(3) 18.4 18.4 18.3 - set the initial capacity to 5000
Setas you are adding new items and exceeding the default size. If you know the size you need you should initialize the capacity (new HashSet<>(2000)), this will save time and memory (copying and rehashing of theSet). Still the best way to decide is to test the performance both ways with your data. I would say the 2nd approach will be faster, but the 1st is easier to read.