1

Say I have a C++ class containing an id:

class MyData {

    std::string _id;
    std::vector<int>  _stuff;

public:
    const std::string& id() { return _id; }

    MyData(std::string id) : _id(std::move(id)) {
    }
};

And I want a map of those objects keyed by id. Is there a way to do it without making a second copy of the string?

Just creating a string view from myData.id() and using that as a key would work (assuming the map move-constructs the objects and the object move-constructs the id.

MyData myData("abc123");
std::unordered_map<std::string_view, MyData> dataCache;
std::string_view id(myData.id());
dataCache.insert({ id, std::move(myData) });

But the small string optimisation kills me. the id string_view would be pointing at the stack. If I could guarantee that none of the ids would be "small" then that would work. And if I could guarantee that most ids would be "small" strings then I could just use

std::unordered_map<std::string, MyData> dataCache;

but if most but not all ids are "big", then things get inefficient or unsafe.

Best I can do is put myData on the heap

auto myData = std::make_unique< MyData>("abc123");
std::string_view id = (myData->id());
std::unordered_map<std::string_view, std::unique_ptr<MyData>> dataCache;
dataCache.insert({ myData->id(),std::move(myData) });

But that involves an extra layer of indirection.

I could do something horrid involving a [unordered_]set and overriding the comparison operators to just look at the id column, but oh my lord that would be ugly.

10
  • 16
    Use a std::unordered_set instead and have the hash and quality operators only use the _id member? Commented Jul 8 at 15:16
  • 5
    Obligatory question for posts about optimisation - are you sure you are not optimising prematurely? For thousands of items in the map you are unlikely to see any difference (unless the map is modified in a tight loop), for hundreds of thousands this would begin to be important. Commented Jul 8 at 15:28
  • 7
    For what it's worth, std::unordered_set<MyData, ...> (with custom hash and comparison) strikes me as the most natural solution. Either that, or splitting MyData in two and using std::unordered_map<std::string, Payload> where Payload is whatever's in MyData sans ID. Commented Jul 8 at 15:30
  • 2
    Kate Gregory once mentioned something along the lines of: if you need a map, use a vector. If you need a set, use a vector. If you need a list, use a vector. If you need an array, use a vector. (Plus some additional caveats for advanced C++ programmers for when to use the non-vector containers.) Commented Jul 8 at 16:13
  • 4
    keys are constant and mapped values are not. What do you want to happen with the map when the mapped value is modified ? (so that effectively the key is modified too) Commented Jul 8 at 17:22

2 Answers 2

1

I could do something horrid involving a [unordered_]set and overriding the comparison operators to just look at the id column, but oh my lord that would be ugly.

That isn't ugly, that's the semantic you want to express. A container of MyData who's keys are the id member, not just copies thereof. Otherwise you run the risk of someone modifying an element, and the lookup no-longer matching.

Sign up to request clarification or add additional context in comments.

Comments

0

Unfortunately the c++ standard library doesn't give us immutable strings with a shared backing store. That, in my opinion, would actually be a lot more useful than std::string.

You have a few options. The simplest by far would be to just change your map to use std::strings as the key

auto myData = std::make_unique< MyData>("abc123");
std::unordered_map<std::string, std::unique_ptr<MyData>> dataCache;
dataCache.insert({ myData->id(),std::move(myData) });

In this case the small string optimization works in your favor as your std::strings become small with no heap pointer to manage. If your strings are generally small I would absolutely go this route.

Honestly, unless profiling shows an issue with string copies, I would stop there. It is simple. Well supported and easy to read.

If, instead your IDs are generally long, the suggestion to back them with std::vector<char> isn't horrible. This side-steps the small string optimization problem. Unfortunately you now will probably need need to copy the constructor string into the vector rather than move. (although calling the constructor probably made a copy anyway, this is just moving where the copy happens)

class MyData {

    std::vector<char> const _id; // note the const to make the map key immutable
    std::vector<int>  _stuff;

public:
    const std::string_view id() { return {_id.data(), _id.size()}; }

    MyData(std::string_view id) : _id(id.begin(), id.end()) {}
};

You could, of course switch to std::[unordered_]set. If you do, I would strongly recommend implementing a custom comparer rather than overloading operators in your class, as that would violate the principle of least surprise of two different MyData's would compare equal because they share the same ID.

For a std::set, this is easy

struct MyDataCompare {
    bool operator()(const MyData& a, const MyData& b) const {
        return a.id() < b.id();
    }
};

std::set<MyData, MyDataCompare> dataCache

std::unordered_set is a bit more work

struct MyDataHash {
    std::size_t operator()(const MyData& d) const {
        return std::hash<std::string>{}(d.id());
    }
};
struct MyDataEqual {
    bool operator()(const MyData& a, const MyData& b) const {
        return a.id() == b.id();
    }
};
std::unordered_set<MyData, MyDataHash, MyDataEqual> dataCache

I am less fond of this, as it kind of hides the key-value lookup nature of your cache.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.