How to detect duplicates in a vector of unordered_map?

Question

Given a vector of unordered_map<u_int,int>, I would like to check if the vector contains any duplicated values. Two unordered_maps are considered duplicated if all of their keys and their corresponding values are equal. I know the comparison operator exists for unordered_maps, but I would like to avoid the pairwise comparison of each element with each other. One classical solution is to insert the values of the vector into a set, then to compare the number of elements in the set and the vector. However, the problem here is that the object to be inserted into the set must have the comparison operators overloaded. In case of the unordered_set, the hash function to be used must be overloaded for the complex object. In order to overload, I need to derive a class from the std::unordered_map. Then I need to overload either the comparison operator or the hash function. Another solution that I could think of is to concatenate all of the key value pairs into a string, then sort the string by the keys and detect the duplicates on those strings. I wonder what would be the best solution for this problem.
Example data:

using namespace std;
typedef unordered_map<u_int,int> int_map;
int_map a = { {1,1}, {2,4}, {3,5} };
int_map b = { {1,1}, {2,-1}, {4,-2} };
int_map c = { {1,1}, {3,5} };

vector<unordered_map<u_int,int>> my_vec;

my_vec.push_back(a);
my_vec.push_back(b);
my_vec.push_back(c);

The contents of my_vec is:

 { { 1 => 1, 2 => 4, 3 => 5 }, 
 { 1 => 1, 2 => -1, 4 => -2 }, 
 { 1 => 1, 3 => 5 } }

Please feel free to ask/commend/edit if the question is not clear enough. Any help would be appreciated. Thank you in advance!

The answer here is false since all of the keys and their corresponding values are not identical in any of the two elements. — anilbey
– anilbey, Commented Aug 22, 2018 at 10:17
Note that you can specify a custom hasher and/or equality comparator as template arguments to both std::set and std::unordered_map, so inheritance is unnecessary. — Angew is no longer proud of SO
– Angew is no longer proud of SO, Commented Aug 22, 2018 at 10:24
(publicly) inheriting from std containers is almost never a good idea. Its lots of text and I find it hard to follow, but I think all you need is a custom comparator. "the problem here is that the object to be inserted into the set must have the comparison operators overloaded" - thats not true, you can use any comparator you like, its the second template parameter of std::set — 463035818_is_not_an_ai
– 463035818_is_not_an_ai, Commented Aug 22, 2018 at 10:36
@anilbey hash functor is the second template argument of unordered set. Comparison functor is the second template argument of the ordered set. — eerorika
– eerorika, Commented Aug 22, 2018 at 11:34

AdityaG · Accepted Answer · 2018-08-22 14:32:21Z

1

you can something similar to the following :

typedef unordered_map<u_int,int> int_map;

struct my_map_comparator
{
    bool operator()(const int_map& a, const int_map& b) 
    { 
      a_hash = compute_hash_for_a(all keys of a)
      b_hash = compute_hash_for_b(all keys of b)

      return a_hash == b_hash; 
    }
};

std::unordered_set<int_map,std::hash<int_map>, my_map_comparator> map_list();

answered Aug 22, 2018 at 14:32

AdityaG

4361 gold badge5 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

anilbey Over a year ago

Could you give me some ideas about how to compute the hash for multiple keys?

AdityaG Over a year ago

if u are using boost .. boost hash combine exist .. otherwise look here stackoverflow.com/questions/17016175/…

anilbey Over a year ago

One detail: since the order is not preserved in unordered_maps, the maps a and d below will result in different hash values unless I sort them. Maybe I should use std::map instead. unordered_map<u_int,int> a = { {1,1}, {2,4}, {3,5} }; unordered_map<u_int,int> d = { {2,4}, {1,1}, {3,5} };

anilbey Over a year ago

Actually, if I hash an unordered map twice. Then I am sure I will get different hash values because the ordering of the keys will be different each time I iterate over the keys.

underscore_d Over a year ago

@anilbey Just to be extremely pedantic: It's not at all guaranteed that "the ordering of the keys will be different each time". The unordered does not mean 'elements are guaranteed never to be in the same order'... It means 'elements are not guaranteed to be in any specific/repeatable order'.

|

Brandlingo · Accepted Answer · 2022-07-04 15:02:41Z

1

If you can get a good hash function for std::unordered_map then you should do it like this probably:

bool has_distinct_values(const std::vector<std::unordered_map<u_int, int>> v)
{
  std::unordered_map<int, std::list<int>> hash_to_indexes_map; 
  for(auto i = 0u; i < v.size(); ++i)
  {
    auto empl_result = hash_to_index_map.emplace(custom_hash(v[i]), {i});
    if (!empl_result.second)
    {  
       for (auto index : empl_result.first->second)
       {
         if (v[index] == v[i]) return false;
       }
       empl_result.first->second.push_back(i);
    }
  }
  return true;
}

The algorithm is straightforward: map hashes to list indexes, doing pairwise map comparison whenever hashes are equal. This way you avoid copying the entire maps, get O(N) (depending mostly on the quality of the hash function you provide) time complexity and generally are good to go.

edited Jul 4, 2022 at 15:02

Brandlingo

3,2421 gold badge29 silver badges39 bronze badges

answered Aug 22, 2018 at 19:53

paler123

9666 silver badges18 bronze badges

4 Comments

anilbey Over a year ago

But even if I have a good hash function, each time I hash the unordered_map, I am sure I will get a different hash value because the keys are not ordered. e.g. each time I iterate over the keys, I see a different order.

paler123 Over a year ago

Well, since the map is unordered then ordering should not really affect the hashes you get. In other words if two maps are considered equal (by explicit check with == op) then they must have equal hashes. A valid hash function needs to produce equal hashes for equal elements, a good hash function should produce different hashes for different elements.

anilbey Over a year ago

I cannot think of an efficient way of implementing a hash function achieving this without having to sort the keys. If the sorting is inevitable, then I prefer storing the data in a map in the first place (instead of unordered_map). What do you think?

paler123 Over a year ago

You would then need to order the maps (e.g. implement comparison operator of some sort). Equality is quite straightforward to define for unordered_maps, I don't see however a universally good approach to ordering them. As for the hash function... You could use any hash on key / value pairs from the map and then accumulate them in a way that does not depend on the ordereing (e.g. product or sum have that property).

Collectives™ on Stack Overflow

How to detect duplicates in a vector of unordered_map?

2 Answers 2

7 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related