0

I'm trying to optimize some code, and when I do this I usually end up getting that helping hand from Hash structures.

What I want to do is divide objects into multiples sets based on some attributes in a very fast way. Basically like SQL GROUP BY statement but for Java.

The thing is that I want to use HashMap<Object, ArrayList<Object>> to do this. I want to use multiple grouping ways but an Object can only have one hashCode().

Is there a way to have multiple hashCodes() in order to be able to group by multiple methods? Are there other structures made to solve this kind of issues? Can I use Java 8 lambda expressions to send a hashCode() in the HashMap parameters? Am I silly and there is a super fast way that isn't this complicated?

Note: The hashCodes I want use multiple attributes that are not constant. So for example, creating a String that represents those attributes uniquely won't work because I'd have to refresh the string every time.

2
  • I don't understand. Let's say that object A is assigned to group 12 based on its current attributes. You store it in the HashMap. Then the attributes change: that will change the group of the object. So what you stored in the HashMap is now useless. A Map key should be immutable. Commented Feb 22, 2016 at 7:04
  • 1
    You shouldn't care about the value of the hash code. You would have one MultiMap per group-by key set, with the MultiMap key being an object describing the group-by key and the value being the list of objects in that group. Commented Feb 22, 2016 at 7:07

3 Answers 3

2

Let's say you have a collection of objects and you want to produce different groupings analogous to SQL GROUP BY. Each group-by is defined by a set of common values. Create a group-by-key class for each distinct grouping type, each with an appropriate hashCode() and equals() method (as required by the Map contract).

For the following pseudocode I assume the existence of a MultiMap class that encapsulates the management of your map's List<Object> values. You could use Guava's MultiMap implementation.

// One group key
public class GroupKey1 {
    ...
    public GroupKey1(MyObject o) {
        // populate key from object
    }
    public GroupKey1(...) {
        // populate from individual values so we can create lookup keys
    }
    public int hashCode() { ... }
    public boolean equals() { ... }
}

// A second, different group key
public class GroupKey2 {
    ...
    public GroupKey2(MyObject o) {
        // populate key from object
    }
    public GroupKey2(...) {
        // populate from individual values so we can create lookup keys
    }
    ...
}
...
MultiMap<GroupKey1,MyObject> group1 = new HashMultiMap<>();
MultiMap<GroupKey2,MyObject> group2 = new HashMultiMap<>();

for (MyObject m : objectCollection)
{
    group1.put(new GroupKey1(m), m);
    group2.put(new GroupKey2(m), m);
}
...
// Retrieve the list of objects having a certain group-by key
GroupKey2 lookupKey = new Groupkey2(...);
Collection<MyObject> group = group2.get(lookupKey);
Sign up to request clarification or add additional context in comments.

1 Comment

Of course, if the group by is a single field, you can likely just use the value type itself, e.g. String, Integer, Date, ...
1

What you're describing sounds like a rather convoluted pattern, and possibly a premature optimization. You might have better luck asking a question about how to efficiently replicate GROUP BY-style queries in Java.

That said the easiest way to have multiple hash codes is to have multiple classes. Here's a trivial example:

public class Person {
  String firstName;
  String lastName;

  /** the "real" hashCode() */
  public int hashCode() {
    return firstName.hashCode() + 1234 * lastName.hashCode();
  }
}

public class PersonWrapper1 {
  Person person;

  public int hashCode() {
    return person.firstName.hashCode();
  }
}

public class PersonWrapper2 {
  Person person;

  public int hashCode() {
    return person.lastName.hashCode();
  }
}

By using wrapper classes you can redefine the notion of equality in a type-safe way. Just be careful about how exactly you let these types interact; you can only compare instances of Person, PersonWrapper1, or PersonWrapper2 with other instances of the same type; each class' .equals() method should return false if a different type is passed in.


You might also look at the hashing utilities in Guava, they provide several different hashing functions, along with a BloomFilter implementation, which is a data structure that relies on being able to use multiple hashing functions.

This is done by abstracting the hashing function into a Funnel class. Funnel-able classes simply pipe the values they use for equality into the Funnel, and callers (like BloomFilter) then actually compute the hash codes.


Your last paragraph is confusing; you cannot hope to store objects in a hash-based data structure and then change the values used to compute the hash code. If you do so, the object will no longer be discoverable in the data structure.

2 Comments

Thanks for the first two points. The last one doesn't apply to my code because after the values change, I'll just use another HashMap. I know it sounds weird but it is part of the strategy, HashMaps are meant to partition only once.
I also found the way to do the Group By with Java 8. Just had too google it, but I won't remove this question because I think you gave a very valid answer for non-java8 users.
0

Taking your thoughts into account:

What I want to do is divide objects into multiples sets based on some attributes in a very fast way. Basically like SQL GROUP BY statement but for Java.

Map<City, Set<String>> lastNamesByCity
     = people.stream().collect(groupingBy(Person::getCity,
                                          mapping(Person::getLastName, toSet())));

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.