3

Hi I am working with std::unordered map. I am dealing with a lot of inserts. The count of inserts is known ahead. I thought to optimize the memory by using the reserve methodand assuming it pre-allocates big chunk of memory for all future inserts. However I have noticed that it makes no difference for the times the map tries to allocate the memory.

See the example here . The number of allocations for both map which has reserved the number of inserts and the one who hasn't is the same?

Is there any way to reduce the number of allocations?

#include <string>
#include <unordered_map>
#include <iostream>
#include <vector>
    
using namespace std;
    
inline size_t AllocationCounter = 0;
    
template <typename T>
struct my_alloc {
    using value_type = T;
    
    my_alloc(){ };
    
    template <typename U>
    constexpr my_alloc(const my_alloc<U>& Other) noexcept  {  }
    
    T* allocate(std::size_t n) {
        AllocationCounter++;
        return  static_cast<T*>(::operator new (n * sizeof(T)));
    }
    
    template <typename... Args>
    void construct(T* p, Args&&... args) {
        new (p) T(std::forward<Args>(args)...);
    }
    
    void deallocate(T* p, std::size_t n) noexcept {
        AllocationCounter --;
        delete p;
    }
    
    void destroy(T* p) noexcept {
        p->~T();
    }
};
    
void with_reserve()
{
    unordered_map<size_t,size_t, std::hash<size_t>, std::equal_to<size_t>, my_alloc<std::pair<const size_t, size_t>>> m1;
    
    m1.reserve(1000); /// <------------------ RESERVATION ?
       
    for (size_t i = 0; i < 1000; i++)
    {
        m1.insert({i,i});
    }
    
    printf("With reserve - %llu Allocations\n", AllocationCounter);
}
    
void without_reserve()
{
    unordered_map<size_t,size_t, std::hash<size_t>, std::equal_to<size_t>, my_alloc<std::pair<const size_t, size_t>>> m1;
    
    for (size_t i = 0; i < 1000; i++)
    {
        m1.insert({i,i});
    }
    
    printf("Without reserve - %llu Allocations\n", AllocationCounter);
}
    
int main()
{
    with_reserve();
    
    without_reserve();     
}

Output:

With reserve - 1001 Allocations
Without reserve - 1001 Allocations

Expected: the number of allocation with reserve to be less that without.

5
  • please also include the output and expected output in the question Commented Jan 28 at 12:43
  • Maybe not use the standard library containers, boost has unordered_flat_map whose reserve straight up allocates a big chunk of memory, there's also the standard library's flat_map that you can technically reserve, but it has O(logn) lookup Commented Jan 28 at 13:01
  • @AhmedAEK regular std::map has O(logn) lookup too. But std::unordered_map has O(1). Commented Jan 28 at 13:24
  • 1
    a sorted std::vector<std::pair<Key,Value>> has expensive insertion and deletion but you can allocate all at once and lookup is as fast as in the map (if not faster because it fits in cache). Commented Jan 28 at 13:28
  • 1
    Inside STL: The unordered_map, unordered_set, unordered_multimap, and unordered_multiset Commented Jan 28 at 15:23

2 Answers 2

4

No. std::unordered_map::reserve only allocates buckets, not nodes.

Sets the number of buckets to the number needed to accommodate at least count elements without exceeding maximum load factor and rehashes the container, i.e. puts the elements into appropriate buckets considering that total number of buckets has changed. Effectively calls rehash(std::ceil(count / max_load_factor())).

Each bucket holds a linked-list of nodes, which are allocated as elements are inserted.

Sign up to request clarification or add additional context in comments.

3 Comments

firstly it says "sets number" not "allocates", secondly isn't memory for bucket also accomodates the stored item itslef? I mean isn't it of type pair<Key,Value> type?
@Boris if you reserve twice then the second won't reallocate, but the first will. Buckets are not nodes, they are lists of nodes. "Internally, the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its key. Keys with the same hash code appear in the same bucket. This allows fast access to individual elements, since once the hash is computed, it refers to the exact bucket the element is placed into." Emphasis added
@Boris The representation is more like vector<list<pair<Key, Value>>>, where each vector element is a "bucket" and all the keys in each bucket hash to the same vector index.
0

Use a custom allocator, not just for profiling but to avoid heap allocations. If your std::unordered_map has exclusively insertions and lookups, but no removals, then you can use a trivial arena type allocator that is backed by a single buffer big enough to fit both the bucket vector as well as all nodes.

Doing so will - for all STL implementations I'm aware of - greatly accelerate both inserts and the final teardown. The allocator used for both the nodes and the bucket list is derived from the element allocator passed to the map itself.

I.e. the following allocator is suitable, and can even be simply allocated in the same scope as the map itself, including on the stack. In the normal case, this results in zero heap allocations.

struct arena
{
    arena() = delete;
    arena(arena&&) = delete;
    arena(const arena&) = delete;
    constexpr arena(void* buffer, size_t size) noexcept: base(buffer), top(buffer), bytes_remaining(size)
    {
    }
    void* const base;
    // Current top of buffer.
    void* top;
    size_t bytes_remaining;
};

template<size_t preallocated_size = 4096>
struct inplace_arena : public arena
{
    inplace_arena() noexcept: arena(_storage, sizeof(_storage))
    {
    }

private:
    std::byte _storage[preallocated_size];
};

template<class T>
struct arena_allocator
{
    typedef T value_type;

    arena_allocator() = delete;

    arena_allocator(arena& storage) noexcept: _storage(storage)
    {
    }

    template<class U>
    constexpr arena_allocator(const arena_allocator<U>& other) noexcept: _storage(other._storage)
    {
    }

    [[nodiscard]] T* allocate(std::size_t n)
    {
        if (n > std::numeric_limits<std::size_t>::max() / sizeof(T))
            throw std::bad_array_new_length();

        const auto allocation_size = n * sizeof(T);
        if (auto from_arena =
                static_cast<T*>(std::align(alignof(T), allocation_size, _storage.top, _storage.bytes_remaining)))
        {
            _storage.top = reinterpret_cast<std::byte*>(_storage.top) + allocation_size;
            _storage.bytes_remaining -= allocation_size;
            return from_arena;
        }
        else
        {
            if (auto from_heap = static_cast<T*>(std::malloc(allocation_size)))
            {
                return from_heap;
            }
        }

        throw std::bad_alloc();
    }

    void deallocate(T* p, std::size_t /* n */) noexcept
    {
        if (p >= _storage.base && p < _storage.top)
        {
            // Pass, no point in recycling.
        }
        else
        {
            std::free(p);
        }
    }

    arena& _storage;
};

template<class T, class U>
bool operator==(const arena_allocator<T>& a, const arena_allocator<U>& b)
{
    return &a._storage == &b._storage;
}

template<class T, class U>
bool operator!=(const arena_allocator<T>& a, const arena_allocator<U>& b)
{
    return !(a == b);
}

Use it like

using element_type = std::pair<const key, value>;
// Enough storage for the nodes, the "past the end" unique sentinel, the links between nodes and the bucket list itself.
// Somewhat implementation specific, use a debugger to verify if it's large enough.
inplace_arena<(sizeof(element_type) + sizeof(void*) * 3) * (expected_element_count + 1)> elements_storage;
using element_allocator = arena_allocator<element_type>;
std::unordered_map<key, value, std::hash<key>, std::equal_to<value>, element_allocator> elements{element_allocator{elements_storage}};
elements.reserve(expected_element_count);

Even when using a custom allocator, you should still keep doing a .reserve() - growing the bucket list and the potentially resulting re-hash is still something to avoid at all cost.


Expected: the number of allocation with reserve to be less that without.

No, but only for the reason that you are counting wrong. The bucket list was reallocated repeatedly, and you had already decremented the counter whenever that happened. There was log(n) extra allocations that you didn't count.

2 Comments

Indeed as can be seen in fixed code (godbolt.org/z/5x8G4Gb7h) reserve saves allocation. I probably missed the fact that in STL implementation reserve only affects the bucket list and not items list inside the buckets which cannot be pre-allocated.
@Boris well yes, the item lists can't bypass the allocator. But the allocator can still bypass the heap. Just give it a try, next time you encounter a situation where you feel like those map type containers are too slow in a specific task. With a simple enough allocator, the whole insertion method can end up inlined.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.