2

Given a vector of unordered_map<u_int,int>, I would like to check if the vector contains any duplicated values. Two unordered_maps are considered duplicated if all of their keys and their corresponding values are equal. I know the comparison operator exists for unordered_maps, but I would like to avoid the pairwise comparison of each element with each other. One classical solution is to insert the values of the vector into a set, then to compare the number of elements in the set and the vector. However, the problem here is that the object to be inserted into the set must have the comparison operators overloaded. In case of the unordered_set, the hash function to be used must be overloaded for the complex object. In order to overload, I need to derive a class from the std::unordered_map. Then I need to overload either the comparison operator or the hash function. Another solution that I could think of is to concatenate all of the key value pairs into a string, then sort the string by the keys and detect the duplicates on those strings. I wonder what would be the best solution for this problem.
Example data:

using namespace std;
typedef unordered_map<u_int,int> int_map;
int_map a = { {1,1}, {2,4}, {3,5} };
int_map b = { {1,1}, {2,-1}, {4,-2} };
int_map c = { {1,1}, {3,5} };

vector<unordered_map<u_int,int>> my_vec;

my_vec.push_back(a);
my_vec.push_back(b);
my_vec.push_back(c);

The contents of my_vec is:

 { { 1 => 1, 2 => 4, 3 => 5 }, 
 { 1 => 1, 2 => -1, 4 => -2 }, 
 { 1 => 1, 3 => 5 } }

Please feel free to ask/commend/edit if the question is not clear enough. Any help would be appreciated. Thank you in advance!

7
  • What would be the answer for your example? 1=>1? Commented Aug 22, 2018 at 10:13
  • The answer here is false since all of the keys and their corresponding values are not identical in any of the two elements. Commented Aug 22, 2018 at 10:17
  • 5
    Note that you can specify a custom hasher and/or equality comparator as template arguments to both std::set and std::unordered_map, so inheritance is unnecessary. Commented Aug 22, 2018 at 10:24
  • 3
    (publicly) inheriting from std containers is almost never a good idea. Its lots of text and I find it hard to follow, but I think all you need is a custom comparator. "the problem here is that the object to be inserted into the set must have the comparison operators overloaded" - thats not true, you can use any comparator you like, its the second template parameter of std::set Commented Aug 22, 2018 at 10:36
  • 1
    @anilbey hash functor is the second template argument of unordered set. Comparison functor is the second template argument of the ordered set. Commented Aug 22, 2018 at 11:34

2 Answers 2

1

you can something similar to the following :

typedef unordered_map<u_int,int> int_map;

struct my_map_comparator
{
    bool operator()(const int_map& a, const int_map& b) 
    { 
      a_hash = compute_hash_for_a(all keys of a)
      b_hash = compute_hash_for_b(all keys of b)

      return a_hash == b_hash; 
    }
};

std::unordered_set<int_map,std::hash<int_map>, my_map_comparator> map_list();
Sign up to request clarification or add additional context in comments.

7 Comments

Could you give me some ideas about how to compute the hash for multiple keys?
if u are using boost .. boost hash combine exist .. otherwise look here stackoverflow.com/questions/17016175/…
One detail: since the order is not preserved in unordered_maps, the maps a and d below will result in different hash values unless I sort them. Maybe I should use std::map instead. unordered_map<u_int,int> a = { {1,1}, {2,4}, {3,5} }; unordered_map<u_int,int> d = { {2,4}, {1,1}, {3,5} };
Actually, if I hash an unordered map twice. Then I am sure I will get different hash values because the ordering of the keys will be different each time I iterate over the keys.
@anilbey Just to be extremely pedantic: It's not at all guaranteed that "the ordering of the keys will be different each time". The unordered does not mean 'elements are guaranteed never to be in the same order'... It means 'elements are not guaranteed to be in any specific/repeatable order'.
|
1

If you can get a good hash function for std::unordered_map then you should do it like this probably:

bool has_distinct_values(const std::vector<std::unordered_map<u_int, int>> v)
{
  std::unordered_map<int, std::list<int>> hash_to_indexes_map; 
  for(auto i = 0u; i < v.size(); ++i)
  {
    auto empl_result = hash_to_index_map.emplace(custom_hash(v[i]), {i});
    if (!empl_result.second)
    {  
       for (auto index : empl_result.first->second)
       {
         if (v[index] == v[i]) return false;
       }
       empl_result.first->second.push_back(i);
    }
  }
  return true;
}

The algorithm is straightforward: map hashes to list indexes, doing pairwise map comparison whenever hashes are equal. This way you avoid copying the entire maps, get O(N) (depending mostly on the quality of the hash function you provide) time complexity and generally are good to go.

4 Comments

But even if I have a good hash function, each time I hash the unordered_map, I am sure I will get a different hash value because the keys are not ordered. e.g. each time I iterate over the keys, I see a different order.
Well, since the map is unordered then ordering should not really affect the hashes you get. In other words if two maps are considered equal (by explicit check with == op) then they must have equal hashes. A valid hash function needs to produce equal hashes for equal elements, a good hash function should produce different hashes for different elements.
I cannot think of an efficient way of implementing a hash function achieving this without having to sort the keys. If the sorting is inevitable, then I prefer storing the data in a map in the first place (instead of unordered_map). What do you think?
You would then need to order the maps (e.g. implement comparison operator of some sort). Equality is quite straightforward to define for unordered_maps, I don't see however a universally good approach to ordering them. As for the hash function... You could use any hash on key / value pairs from the map and then accumulate them in a way that does not depend on the ordereing (e.g. product or sum have that property).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.