3

Let's say I am building a TreeSet of an object for which the ordering depends on only one value.

I cannot do

TreeSet<Foo> tree = new TreeSet<>(Comparator.comparingInt(Foo::getX));

because if I add two different Foo objects with the same x then one would replace the other (i.e. if I do tree.add(foo1) and tree.add(foo2), tree.size() will be 1 instead of 2).

I could compare every field of Foo, but I would like two instances of Foo to be considered distinct, even if every field is the same.

One solution that "almost works" would be

TreeSet<Foo> tree = new TreeSet<>(Comparator.comparingInt(Foo::getX).thenComparing(Foo::hashCode));

But this would fail when there is a hash collision.

In summary, I am looking to something similar to

TreeSet<Foo> tree = new TreeSet<>(Comparator.comparingInt(Foo::getX).thenComparing(Foo::getInternalAddress));

But of course we don't have access to such method.

Workarounds

I know there are workarounds:

  • if I don't care about the objects themselves but just how many are in the tree, I can use a multiset TreeMap<Foo, Integer> (and comparing all the fields) that give me how many Foos for a specific x
  • if I do care about the objects (I am making reference equality checks) I can use a different multiset TreeMap<Foo, List<Foo>> (or TreeMap<Integer, List<Foo>> with the key being x). But if there are very few "duplicated" foos, this is a waste of space taken by all the singleton lists.

So while I know there are workarounds with a TreeMap, I still wonder if there is a way to do that with just a TreeSet.

16
  • 2
    Can you make changes to Foo? How about simply adding a UUID field and sort by that? Commented Oct 2, 2022 at 11:48
  • 3
    I also slightly suspect that this is an XY problem. Commented Oct 2, 2022 at 11:53
  • 2
    Sounds like you want a sorted list, not a set. See Why is there no SortedList in Java? Commented Oct 2, 2022 at 11:54
  • 6
    Have a look at Guava’s Ordering.arbitrary Commented Oct 2, 2022 at 16:49
  • 2
    It's on the radar but it's not particularly high priority. Commented Oct 3, 2022 at 16:45

2 Answers 2

6

Guava provides Ordering.arbitrary() which you can use to construct your desired Comparator:

Comparator.comparingInt(Foo::getX).thenComparing(Ordering.arbitrary())

Ordering.arbitrary() internally uses System.identityHashCode which almost solves your "order arbitrary, but consider identity" problem, except there's no guarantee that identityHashCode will never collide (trivially, since it's a 32bit value and a Java project can easily create more than 232 objects, given enough memory).

Ordering.arbitrary() uses some neat tricks to still sort objects with colliding identityHashCode values in a consistent (but arbitrary!) way, see its code.

Sign up to request clarification or add additional context in comments.

2 Comments

Isn't it more something like .thenComparing(Ordering.arbitrary());?
@Ricola: yes, absolutely, that was a typo, my bad.
0

With an external library

See @Joachim Sauer's answer and @Stuart Marks's comment : Guava's Ordering.arbitrary().

Without external libraries

Add an id field to Foo that is unique for each instance.

With UUID class

Suggested by @Sweeper in a comment.

class Foo {
  private UUID uuid = UUID.randomUUID();
...
}
...
new TreeSet<>(Comparator.comparingInt(Foo::getX).thenComparing(Foo::getUUID));

The chance of collisions are almost inexistent (https://stackoverflow.com/a/24876263/7424948), but if you are paranoid you can use the custom instance counter below.

Instance counter

class Foo {
  // if only one thread, otherwise use AtomicInteger for thread-safe.
  // assuming we'll get less than Integer.MAX_VALUE instances, otherwise use long.
  private static int counter = 0;
  private int id;

  public Foo(...){
    id = counter++;
  } 
...
}
...
new TreeSet<>(Comparator.comparingInt(Foo::getX).thenComparingInt(Foo::getId));

Advantage over UUID is that it takes less space (32 bits per instance instead of 128 bits) and that there is less computation to get the id.

5 Comments

The int field can potentially suffer the exact same problem that identityHashCode has when it overflows. long would be safer here, but even that isn't guaranteed safe (since more than 2^64 Foo objects could be created in the lifetime of a VM).
@JoachimSauer I agree, but this is why I wrote a comment over the field. It depends on the use case, the assumptions and the constraints. In the really worst case this can be a BigInteger.
"since more than 2^64 Foo objects could be created in the lifetime of a VM". Not really. If you want to do that in 10 years, you have to allocate 58,454,204,609 Objects/second.
Also, I think a long has pretty much the same constraints as Ordering.arbitrary() current implementation (2^32 different hashcodes and then 2^32 different AtomicIntegers for collisions)
@JohannesKuhn: fair point! And yes, @Ricola, a long would end up being roughly equivalent to what Ordering.arbitrary() does.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.