2

I've got a data type that consists of actual data and an id:

struct Data {
    int value;
    int DataId;
};

I've got a collection of these data from which I have to frequently get an element based on its id and modify its value (but not it's ID). I thus chose a std::set in order to have an efficient sort, with the following comparator:

class Comparator {
    /// dummy type for declaring is_transparent type
    struct SIsTransparentTag;

   public:
    // tag (falsely) indicating that operator() is transparent, i.e. accepts
    //  any type of argument: this is a workaround to allow std::set::find to
    //  use a int key instead of a full Data object.
    //  https://en.cppreference.com/w/cpp/container/set/find
    //  https://www.fluentcpp.com/2017/06/09/search-set-another-type-key/
    using is_transparent = SIsTransparentTag;

    // comparator between two Data
    bool operator()(const Data &lhs, const Data &rhs) const {
        return lhs.DataId < rhs.DataId;
    }
    // comparator between a Data and an id
    bool operator()(const Data &lhs, const int id) const {
        return lhs.DataId < id;
    }
    // comparator between an id and a Data
    bool operator()(const int id, const Data &rhs) const {
        return id < rhs.DataId;
    }
};

Then I wrote a read-write getter:

Data* GetRWDataFromID(std::set<Data,Comparator> &collection,int id)
{
    auto it = collection.find(id);
    if (it == std::end(collection))
    {
        return nullptr;
    }
    // UB as soon as I will modify Data!!!
    return const_cast<Data *>(&(*it));
}

Playground

std::set::find() only returns an iterator to a non-mutable element. However, I only want to be able to modify the value member, thus never changing the set order.

This design is broken because removing constness of an object and modifying it is undefined behavior: https://en.cppreference.com/w/cpp/language/const_cast.

How can I fix my design while keeping an efficient search (typically logarithmic) and letting the id attached to a data (when I get a data, I want to get its id from it, not from some book-keeping; thus I'd like to avoid a std::map)?

A bonus would be to have DataId non-mutable once a Data object is created (for this one, I imagine I can make it private and use read-only getters?).

8
  • Modifying an object after const_cast has removed constness is UB only if the object was const to begin with. But that is not the case here. You simply have a const reference/pointer to a non-const (mutable) object, so removing the constness from the reference/pointer is not UB. Commented Mar 21 at 22:51
  • @RemyLebeau: How do you know that the object was not originally const qualified? Isn't that an implementation detail of std::set if the values are stored as const or not? Commented Mar 21 at 22:56
  • 7
    Why can't you use a std::map? You get a Data from somewhere and then you can add it to your map like my_map[data.id] = data.value; Commented Mar 21 at 23:07
  • Using at least C++17, you can extract a std::set node, modify it, and insert it back. Commented Mar 21 at 23:37
  • @NathanOliver absolutely, this is screaming out for a std::map Commented Mar 21 at 23:56

4 Answers 4

4

Just use std::map directly.

when I get a data, I want to get its id from it, not from some book-keeping; thus I'd like to avoid a std::map

Inside std::map, it store the key and value as a std::pair, so you do not need any book-keeping to get both key and value.

Further reading: more information about std::map

Sign up to request clarification or add additional context in comments.

Comments

3

You might (ab)use of mutable:

struct Data {
    mutable int value;
    int DataId;
};

then you can return a const Data* and modify its value. Caveat is that you no longer have direct possibility to have "const" value.

Comments

0

One way to do this is to make your data value mutable like this:

struct Data {  
    mutable int value;  
    int DataId;  
};

Then return a const* as normal:

const Data* GetRWDataFromID(std::set<Data,Comparator> &collection,int id)
{
    auto it = collection.find(id);
    if (it == std::end(collection))
    {
        return nullptr;
    }
    // UB as soon as I will modify Data!!!
    return &(*it);
}

Now you can still modify the mutable element.

Alternatively you can use a different container and manage the order yourself.

Something a bit like this:

// some utility functions to insert and remove sorted
// unique values from a vector

auto insert_unique(std::vector<Data>& v, Data&& d)
    -> std::pair<std::vector<Data>::iterator, bool>
{
    auto lb = std::lower_bound(std::begin(v), std::end(v), d, Comparator());

    if(lb != std::end(v) && lb->DataId == d.DataId)
        return {lb, false};

    return {v.insert(lb, std::forward<Data>(d)), true};
}

auto insert_unique(std::vector<Data>& v, Data const& d)
    { auto copy = d; return insert_unique(v, std::move(copy)); }

auto remove_unique(std::vector<Data>& v, Data const& d)
    -> std::pair<std::vector<Data>::iterator, bool>
{
    auto lb = std::lower_bound(std::begin(v), std::end(v), d, Comparator());

    if(lb != std::end(v) && lb->DataId == d.DataId)
        return {v.erase(lb), true};

    return {std::end(v), false};
}

int main()
{
    std::vector<Data> v;

    Data d{1, 2};

    insert_unique(v, d);

    // do a binary search
    auto lb = std::lower_bound(std::begin(v), std::end(v), d, Comparator());

    if(lb != std::end(v)) // was it found?
        lb->value = 9;
}

Of course that extra code can be wrapped up in a "uniquely sorted vector" class.

Comments

0

This answer does not give a direct answer but provides some hints towards a better performance for querying if it is a bottleneck for you. It also supposes that the collection is immutable, ie. you can look for items in the collection after every items have been put into.

Actually, std::set::find is logarithmic in the size of the container, but you can almost reach a lookup in constant time if the collection can't grow. The idea is to store all the Data objects in a std::vector (which should be optimal in terms of space) and to store the keys in a succinct data structure such as Elias Fano that provides rank/select operation in nearly constant time (and with a little extra memory usage explained in the link above).

Then, from one given key k, you can find what is the position of the corresponding entry (if any) in the vector, which can be done with rank/select operations.

Here is a sketch of a IndexedVector class that uses the succinct library:

template<typename T, auto getkey>
class IndexedVector
{
public:

    using key_type = decltype(getkey(T{}));

    template<typename Range>
    IndexedVector (Range range, key_type maxKey=std::numeric_limits<key_type>::max()) 
    {
        succinct::elias_fano::elias_fano_builder bvb (maxKey, range.size());

        // We build both the vector of items and the Elian Fano structure holding the keys
        for (auto const& item : range)
        {
            bvb.push_back (getkey(item)); 
            items_.push_back (item);
        }

        succinct::elias_fano tmp(&bvb);
        bitmap_.swap (tmp);
    }

    std::optional<T> find (key_type k)
    {
        if (bitmap_[k]) {  return items_[bitmap_.rank(k+1)-1]; }
        else            {  return {};                          }
    }

private:
    std::vector<T>       items_;   // holds the items
    succinct::elias_fano bitmap_;  // holds the bitset containing the keys
};

Demo

With this kind of approach, one can observe a speedup (compared to std::set::find) ranging from 3x to 10x in function of the number of items in the collection.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.