0

I'm working on a C++ project where I've defined a custom class to use as the key in an unordered_map. To do this, I overrode the == operator and provided a custom hash function. The following is a simplified version of my code.

During execution, I noticed that inserting key-value pairs into the unordered_map triggers the copy constructor of my custom class. This leads me to worry about potential performance issues, especially when dealing with large objects that contain significant amounts of data.

Coming from a background in Python and Java, where hash tables store references to objects rather than copying them, I'm unsure about the best practices for using hash tables in C++. Is there a more efficient way to handle large objects as keys to avoid the overhead of copying? What are the recommended practices for optimizing performance in such scenarios?

PS: I don't believe this is an option-based question. When dealing with objects that serve as keys and are significantly large—such as one encapsulating a large-sized vector of integers—there are undoubtedly conventional methods to avoid copying the key with each insertion. One approach I can think of involves using pointers. However, as I am just beginning to learn C++, I am not certain if this is the best method.

#include <iostream>
#include <unordered_map>
#include <functional> 

class MyClass {
private:
    int key;
    std::string value;

public:
    MyClass(int k, const std::string& v) : key(k), value(v) {}

    MyClass(const MyClass& obj) {
        std::cout << "copy constructor called" << std::endl;
        key = obj.key;
        value = obj.value;
    }

    // override operator== 
    bool operator==(const MyClass& other) const {
        return (key == other.key && value == other.value);
    }

    // hash function
    friend std::size_t hash_value(const MyClass& obj) {
        std::size_t seed = 0;
        std::hash<int> hasher;
        std::hash<std::string> strHasher;
        seed ^= hasher(obj.key) + 0x9e3779b9 + (seed<<6) + (seed>>2);
        seed ^= strHasher(obj.value) + 0x9e3779b9 + (seed<<6) + (seed>>2);
        return seed;
    }
};

namespace std {
    template <>
    struct hash<MyClass> {
        std::size_t operator()(const MyClass& obj) const {
            return hash_value(obj);
        }
    };
}

int main() {
    std::unordered_map<MyClass, int> map;

    MyClass c1(1, "1");
    MyClass c2(2, "2");
    MyClass c3(1, "1");
    std::cout << "------step 1-------" << std::endl;
    map[c1] = 1;
    map[c2] = 2;
    std::cout << "------step 2-------" << std::endl;
    std::cout << map[c3] << std::endl;
    std::cout << "------step 3-------" << std::endl;
    map[MyClass(5, "5")] = 5;
    std::cout << "------step 4-------" << std::endl;
    std::cout << (map.find(MyClass(4, "4")) != map.end()) << std::endl;
    std::cout << "------step 5-------" << std::endl;
    auto p = std::make_pair(c1, 5);
    std::cout << "------step 6-------" << std::endl;
    map.insert(p);
    return 0;
}

print result:

------step 1-------
copy constructor called
copy constructor called
------step 2-------
1
------step 3-------
copy constructor called
------step 4-------
0
------step 5-------
copy constructor called
------step 6-------
9
  • Maybe, create the key on the heap and store a pointer to that object in the map. Commented Feb 19, 2024 at 8:37
  • Depends on your use case, you could make your class movable, you could store a pointer instead, you could just not worry about it Commented Feb 19, 2024 at 8:38
  • Opinion-based questions are off-topic, see help center. If you have a large object as a key, you are likely doing it wrong. I'm also pretty sure Python and Java have mechanisms to prevent mutation of the key, such as immutable strings that are effectively copied every time you "mutate" it. Commented Feb 19, 2024 at 8:38
  • @PasserBy In Python, a dictionary key must be hashable, meaning it overrides the hash function. It is entirely acceptable for the key to be mutable. In contrast, Java allows any class to be used as a key in a HashMap. However, if you want the keys to work as expected, you must override the equals and hashCode methods (otherwise it will use "memory address" to be hashcode). For instance, you can use List<Integer> as a key in a hashmap. Although the key itself is mutable, it is standard practice to adhere to the contract by not modifying the fields of a key that is stored in a hashmap. Commented Feb 19, 2024 at 8:46
  • @PasserBy I don't believe this is an option-based question. When dealing with objects that serve as keys and are significantly large—such as one encapsulating a large-sized vector of integers—there are undoubtedly conventional methods to avoid copying the key with each insertion. One approach I can think of involves using pointers. However, as I am just beginning to learn C++, I am not certain if this is the best method. Commented Feb 19, 2024 at 8:50

1 Answer 1

2

You can use std::shared_ptr of MyClass as the keys in your unordered_map. This would stop the copy. The MyClass implementation remains mostly same. You can change the hashing as follows.

namespace std {
template <>
struct hash<std::shared_ptr<MyClass>> {
    std::size_t operator()(const std::shared_ptr<MyClass>& obj) const {
        return hash_value(*obj);
    }
};

template <>
struct equal_to<std::shared_ptr<MyClass>> {
    bool operator()(const std::shared_ptr<MyClass>& lhs, const 
    std::shared_ptr<MyClass>& rhs) const {
        return *lhs == *rhs;
    }
};

Here, equal_to<std::shared_ptr<MyClass>> dereferences the shared_ptrs and uses MyClass::operator== to compare the MyClass objects.

int main() {
    std::unordered_map<std::shared_ptr<MyClass>, int> map;

    auto c1 = std::make_shared<MyClass>(1, "1");
    auto c2 = std::make_shared<MyClass>(2, "2");
    auto c3 = std::make_shared<MyClass>(1, "1");
    std::cout << "------step 1-------" << std::endl;
    map[c1] = 1;
    map[c2] = 2;
    std::cout << "------step 2-------" << std::endl;
    std::cout << map[c3] << std::endl;
    std::cout << "------step 3-------" << std::endl;
    map[std::make_shared<MyClass>(5, "5")] = 5;
    std::cout << "------step 4-------" << std::endl;
    std::cout << (map.find(std::make_shared<MyClass>(4, "4")) != 
    map.end()) << std::endl;
    std::cout << "------step 5-------" << std::endl;
    auto p = std::make_pair(c1, 5);
    std::cout << "------step 6-------" << std::endl;
    map.insert(p);
    return 0;
}

Output:

------step 1-------
------step 2-------
1
------step 3-------
------step 4-------
0
------step 5-------
------step 6-------

As I see it, one of the main issues of using the std::shared_ptr as keys to an std::unordered_map is that if there are multiple pointers pointing to the same object, the map will still consider them as different keys.

Sign up to request clarification or add additional context in comments.

2 Comments

Your last paragraph applies just as much to dumb pointers as to std::shared_ptr. But I don't know why you mention it since the implementation you propose doesn't have that problem.
I mentioned it not for this particular case but in general for using pointers as keys to an unordered map. I realized this while coming up with the answer and thought it is an important consideration.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.