4

I have integer keys which have to be associated to std::vector<T>, with T an iterator-like object (you can safely assume each element of the vector is a pointer).

The obvious candidate from the STL is a std::unordered_map<int,vector<T>>, mainly because I should not get hash collisions as my keys are just integers. In my particular case each key maps to a vector with potentially "many" elements which are pushed back. To give an example with a std::unordered_map<int,vector<double>>

constexpr size_t map_size = 360;
constexpr size_t vec_size = 1000;

std::unordered_map<int, std::vector<double>> grid_unmap_int;
for (size_t i = 0; i < map_size; ++i)
  {
    for (size_t j = 0; j < vec_size; ++j)
      {
        grid_unmap_int[i].push_back(j);
      }
  }

If I compare the performance of the insertion using the code above with a std::map<int,std::vector<double>>, I get quite different results which seems to depend on the chosen compiler. In particular, the result for the std::unordered_map using clang turns out to be much faster. Why does the unordered_map behaves so poorly with GCC ? I am afraid it's related to the fact my values are vector with no known size...

Here's a link to quick_bench: https://quick-bench.com/q/fgQ9XuZ9tmeXcKYEGuJOW165FtE

Here are the associated results (-std=c++17 and -O3) Output with GCC 9.4: enter image description here

Output with CLANG 14.0: enter image description here

11
  • 4
    @Someprogrammerdude: Yes, that's the question. Both compilers are using libstdc++ (their screenshot shows the clang version wasn't set to use libc++), so it's the same C++ implementation of those classes, compiled by different compilers, unless header versions differ. And they're asking why those compilers make differently-performing asm for the x86-64 cloud instances Quick-Bench runs them on. At least I hope they realize that's what the question boils down to. Commented Dec 3, 2023 at 12:16
  • 3
    Yes, it's implementation-dependent, but what I'd like to understand is indeed what is the reason for that slowdown. Commented Dec 3, 2023 at 12:16
  • 1
    The reallocation was actually intentional because in my case I don't even have an estimate for the capacity. Commented Dec 3, 2023 at 12:16
  • 3
    I'm curious if the quick-bench empty-loop baseline ran the same speed on both runs; if so, GCC's std::map was over 3x faster than clang's. Would be worth trying on an idle desktop where we don't have to worry about different runs being on different cloud hardware, and on different levels of competing load (which quick-bench tries to factor out by only showing performance relative to an empty loop that it tested on the same instance.) Commented Dec 3, 2023 at 12:23
  • 1
    Is it intentional that your benchmark appends to the same bucket inside the inner loop, giving the compiler a chance to hoist the map/unordered_map lookup and just grow a std::vector? Or at least reuse the hash, if it's not fully hoisting. Commented Dec 3, 2023 at 23:59

1 Answer 1

0

If you lower the vec_size to say 1, you'll see that the difference is not so significant. So the actual cause should be related to the std::vector::push_back() related code.

I assume the clang just effectively caches the grid_unmap_int[i] value. The operator[] is non-const so it can modify a map and this fact can make a big difference.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.