119

How should the model class's equals and hashcode be implemented in Hibernate?

What are the common pitfalls?

Is the default implementation good enough for most cases?

Is there any sense in using business keys?

It seems to me that it's pretty hard to get it right to work in every situation, when lazy fetching, ID generation, proxy, etc are considered.

1

10 Answers 10

90

Hibernate has a nice and long description of when / how to override equals() / hashCode() in the documentation.

The gist of it is you only need to worry about it if your entity will be part of a Set or if you're going to be detaching/attaching its instances. The latter is not that common. The former is usually best handled via:

  1. Basing equals() / hashCode() on a business key - e.g. a unique combination of attributes that is not going to change during the object (or, at least, session) lifetime.
  2. If the above is impossible, base equals() / hashCode() on the primary key IF it's set and object identity / System.identityHashCode() otherwise. The important part here is that you need to reload your Set after the new entity has been added to it and persisted; otherwise, you may end up with strange behavior (ultimately resulting in errors and/or data corruption) because your entity may be allocated to a bucket not matching its current hashCode().
Sign up to request clarification or add additional context in comments.

7 Comments

When you say "reload" @ChssPly76 you mean doing a refresh()? How does your entity, which obeys the Set contract end up in the wrong bucket (assuming you have a good enough hashcode implementation).
Refresh the collection or reload the entire (owner) entity, yes. As far as wrong bucket goes: a) you add new entity to set, its id is not set yet so you're using identityHashCode which places your entity in bucket #1. b) your entity (within set) is persisted, it now does have an id and thus you're using hashCode() based on that id. It's different from above and would have placed your entity in the bucket #2. Now, assuming you hold a reference to this entity elsewhere, try calling Set.contains(entity) and you'll get back false. Same goes for get() / put() / etc...
Makes sense but never used identityHashCode myself though I see it used in the Hibernate source like in their ResultTransformers
When using Hibernate, you could also run into this problem, to which I still haven't found a solution.
@ChssPly76 Due to business rules that determine if two object are equal I will need to base my equals/hashcode methods on properties that may change within an object's lifetime. Is that really a big deal? If so how do I get around it?
|
48

I don't think that the accepted answer is accurate.

To answer the original question:

Is the default implementation good enough for most cases?

The answer is yes, in most cases it is.

You only need to override equals() and hashcode() if the entity will be used in a Set (which is very common) AND the entity will be detached from, and subsequently re-attached to, hibernate sessions (which is an uncommon usage of hibernate).

The accepted answer indicates that the methods need to be overriden if either condition is true.

3 Comments

This aligns with my observation, time to find out why.
"You only need to override equals() and hashcode() if the entity will be used in a Set" is completely enough if some fields identify an object, and so you don't want to rely on Object.equals() to identify objects.
If I have an entity in my session, and I retrieve it again through hibernate, it is guaranteed to be the same. But what about native queries? spring-data makes it easy to write them and have them return instances of the @Entity-class
24

The best equals and hashCode implementation is when you use a unique business key or natural identifier, like this:

@Entity
public class Company {
    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;
 
    @Column(unique = true, updatable = false)
    private String name;
 
    @Override
    public int hashCode() {
        HashCodeBuilder hcb = new HashCodeBuilder();
        hcb.append(name);
        return hcb.toHashCode();
    }
 
    @Override
    public boolean equals(Object obj) {
        if (this == obj) {
            return true;
        }
        if (!(obj instanceof Company)) {
            return false;
        }
        Company that = (Company) obj;
        EqualsBuilder eb = new EqualsBuilder();
        eb.append(name, that.name);
        return eb.isEquals();
    }
}

The business key should be consistent across all entity state transitions (transient, attached, detached, removed), that's why you can't rely on id for equality.

Another option is to switch to using UUID identifiers, assigned by the application logic. This way, you can use the UUID for the equals/hashCode because the id is assigned before the entity gets flushed.

You can even use the entity identifier for equals and hashCode, but that requires you to always return the same [hashCode value so that you make sure that the entity hashCode value is consistent across all entity state transitions, like this:

@Entity(name = "Post")
@Table(name = "post")
public class Post implements Identifiable<Long> {
 
    @Id
    @GeneratedValue
    private Long id;
 
    private String title;
 
    public Post() {}
 
    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
 
        if (!(o instanceof Post))
            return false;
 
        Post other = (Post) o;
 
        return id != null &&
               id.equals(other.getId());
    }
 
    @Override
    public int hashCode() {
        return getClass().hashCode();
    }
  
    //Getters and setters omitted for brevity
}

8 Comments

+1 for the uuid approach. Put that into a BaseEntity and never think again about that problem. It takes a bit of space on the db side but that price you better pay for the comfort :)
returning the same hash-code value for all objects makes hash-based datastructures pointless.
What is your opinion on jpa-buddy.com/blog/… ?
@julaine The article is flawed because if you try to add a non-initialized Proxy in HashSet from outside a transactional context, you will get: Post$HibernateProxy$V1GXkc02.hashCode(Unknown Source) at java.base/java.util.HashMap.hash(HashMap.java:337) at java.base/java.util.HashMap.put(HashMap.java:609) at java.base/java.util.HashSet.add(HashSet.java:221)
@VladMihalcea Thank for this interesting point that I certainly wouldn't have thought of. But how would that happen? If I create an entity manually then it is not a proxy, if I read an object from the database then I am in a transactional context. I probably lack imagination here - my hibernate-experience is with very 'tame' projects (spring-data, OSIV, postgres, no hand-coded jdbc-access etc).
|
14

When an entity is loaded through lazy loading, it's not an instance of the base type, but is a dynamically generated subtype generated by javassist, thus a check on the same class type will fail, so don't use:

if (getClass() != that.getClass()) return false;

instead use:

if (!(otherObject instanceof Unit)) return false;

which is also a good practice, as explained on Implementing equals in Java Practices.

for the same reason, accessing directly fields, may not work and return null, instead of the underlying value, so don't use comparison on the properties, but use the getters, since they might trigger to load the underlying values.

1 Comment

This works if you are comparing objects of concrete classes, which did not work in my situation. I was comparing objects of super classes, in which case this code worked for me: obj1.getClass().isInstance(obj2)
6

Yeah, it's hard. In my project equals and hashCode both rely on the id of the object. The problem of this solution is that neither of them works if the object has not been persisted yet, as the id is generated by database. In my case that's tolerable since in almost all cases objects are persisted right away. Other than that, it works great and is easy to implement.

3 Comments

What I think we did is to use object identity in the case where the id has not been generated
the problem here is that if you persist the object, your hashcode changes. That can have big detrimental results if the object is already part of a hash based data structure. So, if you do wind up using object identity, you'd better continue using obj id until the object is completely freed (or remove the object from any hash based structures, persist, then add it back in). Personally, I think it would be best to not use id, and base the hash on immutable properties of the object.
@KevinDay well you shouldn't mutate an object with respect to its hash while it is used as a hashmap-key or a hashset-entry. That is a general rule not specific to hibernate.
2

In the documentation of Hibernate 5.2 it says you might not want to implement hashCode and equals at all - depending on your situation.

https://docs.jboss.org/hibernate/orm/5.2/userguide/html_single/Hibernate_User_Guide.html#mapping-model-pojo-equalshashcode

Generally, two objects loaded from the same session will be equal if they are equal in the database (without implementing hashCode and equals).

It gets complicated if you're using two or more sessions. In this case, the equality of two objects depends on your equals-method implementation.

Further, you'll get into trouble if your equals-method is comparing IDs that are only generated while persisting an object for the first time. They might not be there yet when equals is called.

Comments

1

If you happened to override equals, make sure you fulfill its contracts:-

  • SYMMETRY
  • REFLECTIVE
  • TRANSITIVE
  • CONSISTENT
  • NON NULL

And override hashCode, as its contract rely on equals implementation.

Joshua Bloch(designer of Collection framework) strongly urged these rules to be followed.

  • item 9: Always override hashCode when you override equals

There are serious unintended effect when you don't follow these contracts. For example List#contains(Object o) might return wrong boolean value as the general contract not fulfilled.

2 Comments

This is just general advice on hadshcode/equals but the question is about JPA-entities specifically.
The advice above is valid for entities that override its equals method. This is especially true if developers want to use these entities with Collections API that rely on equality contracts obeyed for it to function correctly.
1

Accordin last (Hopefully the final) jpa-buddy investigation, little adapted:

import static org.hibernate.proxy.HibernateProxyHelper.getClassWithoutInitializingProxy;

public abstract class BaseEntity {

  @Override
  public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || getClassWithoutInitializingProxy(this) != getClassWithoutInitializingProxy(o)) return false;
    return getId() != null && getId().equals(((BaseEntity) o).getId());
  }

  @Override
  public final int hashCode() {
    return getClassWithoutInitializingProxy(this).hashCode();
  }

HibernateProxyHelper exist at hibernate-core 5.6.x, but missed at 6.5.x version.

public static Class getClassWithoutInitializingProxy(Object object) {
    return (object instanceof HibernateProxy proxy) ?
            proxy.getHibernateLazyInitializer().getPersistentClass() : object.getClass();
}

7 Comments

The JPA-article is very nice. But can you explain your adaptions? I see you factored out getEffectiveClass as a utility-method, but what about the Mapped BaseEntity with AccessType Field? Those two things don't seem to be related to the problem.
@julaine You are right, remove in order not to confuse others
I only now noticed this returns the same hashcode for every object. While that is correct as per the hashcode/equals standard, it also completely defeats the point of hash-based datastructures as they will decay into lists with O(n)-access. I absolutely do not understand why they do not use getId() in their hashCode.
This inefficient hash-code realization for case, when in hash-collection present new object with id=null. Once the id is generated (on its first save) the hashCode gets changed. So the HashSet looks for the entity in a different bucket and cannot find it. It wouldn’t be an issue if the id was set during the entity object creation (e.g. was a UUID set by the app), but DB-generated ids are more common.
Storing an object into a datastructure imposes limits on how you may modify the object, as sorting and hashing-based datastructures will assume their keys are stable. This problem is not specific to hibernate at all. Silently deteriorating all hash-based-datastructures is nuts. If you do want to do that, it would be better to just throw an exception in hashCode() to inform the programmer they should not be using entities in HashSets and similar data-structures.
|
0

You should not override them. It is wrong from a perspective of what an entity is supposed to be.


Why is it wrong to compare entities?

A hibernate-entity (an instance of an entity-class) is your programs representation of an entry in the database (A 'row', in the simplest case of a single SQL-table). This row exists outside of your program logic and memory. It is similar to an object representing a 'file' or a 'thread', it is not contained in your program, it is a proxy to something outside of your program. At what point does it make sense to say that two file-objects are the same? When they represent the same file. But how often do you even compare file-objects? You don't, because the same file should not have two different representations in the same context.

In this way - the hibernate-concept of an "Entity" matches the Domain-Driven-Design (DDD) concept of an entity, which is probably no accident. You can read up on it on SO: What is an Entity? ; Difference Entity vs ValueObject

But isn't an equals/hashcode-contract helpful for disambiguating entities?

Hibernate aims to preserve the abstraction of the entity representing your row. Even if you read the same row from two queries in the same session, it will return the same object again. You can only get two different java-objects representing the same database-row if you detach/reattach or use multiple sessions. This is not a standard usecase of hibernate, but it is entirely possible this happens, for example when multithreading.

So imagine your program finds itself with these two objects:

Person(id=3, name='Sarah', occupation='programmer')
Person(id=3, name='Sarah', occupation='manager')

Which occupation should you continue with for Sarah? Is this an error? Does one version 'win'? You cannot decide just from these two rows, the context of what you are doing matters. And the context to decide this is not included in a HashMap or an entity-class, it is in the code where these two appear.

If you were to define some equality on Person and throw these two objects into the same HashSet or whatever, then one of two things would happen:

  1. 'first-save wins' - one version of Sarah gets discarded based on ordering which is probably an implementation-detail of your code

  2. You have two conflicting versions of Sarah in your datastructure.

Both are bad. Notice how it doesn't matter how the equality is defined, any general equality would lead to these results - the mistake is not the specific definition of the equality, the mistake is to leave disambiguation to a part of the code that does not have the right context to decide it.

Why are the given approaches wrong?

Two approaches are given in the answers on this site:

  1. base equality on a business key
  2. Have hashcode always return the same value for all objects of the class
  3. base equality on ID

Number 1 might actually work, but bear in mind that business-keys are often immutable in most but not all usecases - the classic example is a user (entity) changing their email-adress (business-key). And keep in mind this doesn't help you resolve ambiguity still, and having an equality that you shouldn't be using is not useful.

Number 2 completely defeats the purpose of hash-based datastructures. It is plain silly to do this. Your HashSets will decay into linked-lists. You are not hashing anything, so you can't have a hashset. I think a lot of people (myself included) did not notice this flaw with the provided implementations because the code looks so careful and advanced (taking care of Proxies and whatnot)

Number 3 works best as ID is the property that actually matches your object to its database-counterpart. It does not work with unpersisted entities so it is limited, and it, too, does not solve the problems mentioned above - you always need custom disambiguation logic if you have two java-objects representing the same database-row. The approach works better if null-IDs (of not-persisted entities) are not a problem. It is possible to generate IDs in the application rather than the database and just never face the situation that an ID field changes. Read the excellent article Don't let hibernate steal your identity to see how this solves the problem.

What should I do instead?

If you feel the need to hash your entities, by using them as keys in a map or as elements in a set, consider these alternatives:

  1. Use identity-based collections: IdentityHashMap, IdentityHashSet. They determine equality with reference-equality (==) not .equals()

  2. Extract business-keys or IDs and use them as keys. Any Set<Person> can be a Map<PersonKey, Person>, any Map<Person, Something> can be a Map<PersonKey, Something>.

  3. Just use a List<Person>, unlike sorting or hashing containers it does not limit mutability of its elements.

  4. If you frequently store the same extra-information on your entities in a Map<Person, Extra>, make Extra a transient field on Person

Is this problem exclusive to Hibernate?

There is a very general problem at play here:

You cannot modify an object with respect to its hash-value while it is in a hash-based datastructure (a similar rule holds for sorting-based datastructures). Java, like many other languages, does not have any compiler-rules to help you here. You have to ensure this is true, otherwise your objects get 'lost'. No hibernate here.

Where hibernate comes into play is that it may assign an ID to your object when it saves it, so mutations may happen at places where a beginner may not expect them, but they still happen in a controlled manner.

Making broken and wrong code bearably predictable by silently converting all HashSets into LinkedLists is an incredibly duct-tapey-solution that should not be the recommendation to people asking how to properly write Entity-classes.

Comments

-1

There is very nice article here: https://docs.jboss.org/hibernate/stable/core.old/reference/en/html/persistent-classes-equalshashcode.html

Quoting an important line from the article:

We recommend implementing equals() and hashCode() using Business key equality. Business key equality means that the equals() method compares only the properties that form the business key, a key that would identify our instance in the real world (a natural candidate key):

In simple terms

public class Cat {

...
public boolean equals(Object other) {
    //Basic test / class cast
    return this.catId==other.catId;
}

public int hashCode() {
    int result;

    return 3*this.catId; //any primenumber 
}

}

1 Comment

"Business key" in this context means not the artificial ID but another attribute or set of attributes that are unique.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.