2

I need to create a multi-tree in a critical part of my code. Standard way is to use something like:

struct node{ // or class
   ... data ...
  std::unordered_set<node*> children
};

node* root;

One important thing is that, once a node is created, on won't be deleted until the whole tree will be deleted.

The obvious way is to use the new and delete statements to manage these datas on the heap. Done.

I am wondering if it is worth exploring performances of using a std::vector instead of new/delete directly:

struct node{ // or class
   ... data ...
  std::unordered_set<uint_32> children
};

std::vector<node> tree;
uint_32 root; // probably always 0, fist element of vector

Deleting the tree would be simply done with tree.clear();

Is there any chance that it would be faster for big (or huge) trees?

8
  • 3
    Why use raw pointer? std::unordered_set<str::unique_ptr<node>> children and deleting the tree is as simple as children.clear() Commented Aug 4, 2020 at 7:21
  • 3
    A vetor needs to reallocate as it grows. But it will require fewer allocations than allocating every single node. If you can roughly predict the number of nodes you can reduce the need for reallocation, so yes, it might be worth to meassure it. Commented Aug 4, 2020 at 7:23
  • Why do you use a sdt::unordered_set? This is a quite heavy and slow data structure. How many children can you have in each node in the worst cases? Is this number bounded? Commented Aug 4, 2020 at 7:29
  • The specification itself does not talk about heap and stack, those are just one way how to create a c++ implementation that fulfills the requirements of the specs. In common implementations both the default new (if not overloaded or in place new), and std::vector will create the elements in the heap. Commented Aug 4, 2020 at 7:30
  • A researcher I knew building huge trees for some models was concerned about the deletion time (= reading the tree to delete each node), he ended up creating the whole data in a different subprocess and killing it when he was done, the OS was clearing it very fast. About the allocation though, it does not help your question, but I believe your idea of one big allocation could be more efficient. Commented Aug 4, 2020 at 7:46

2 Answers 2

2

I'm guessing you have a tree in the size of some 10 million+ elements (or at least in the ballpark of 100k+ elements) to be concerned with this.

It then all depends on how your data and tree is used. Spatiality of data is usually of concern but its not obvious if that is your case too. If the tree is traversed a lot after being built (somewhat opposed to being modified a lot) and only being modified a little, then it makes sense to try to organize data such that it is iterated contiguous (as a std::vector usually is). In this case, you do not want to have pointers like that or rather you do not want to use single element new/delete for sure as that is almost guarenteed to bug you down (as you cannot count on the elements being placed contiguous in memory) - but it all depends on your use case(s).

A simple method and radical different approach would be:

std::vector< std::pair<PathInTree, Node> > tree;

Then make it sorted such that the Nodes appear in the order that you are iterating the tree. This is a somewhat extreme solution and has drawbacks including that inserting can now become expensive if theres a lot of random insertion going on. Also I left out how to implement Path in the tree, but it should usually be some kind of sequence (like in a filesystem structure).

Sign up to request clarification or add additional context in comments.

1 Comment

indeed the context is random data, and the size is at least in the millions. Don't think that ordering the nodes is an option. After the tree is built, the data are processed only once, and the tree is destroyed. Using Node instead of Node¨* or unique_ptr is an option. I never did it, just tried. At some point I need to write something like for(auto& node: nodes) {...}. The problem here is that node is then a const while I need to change its content in the loop.
1

Considering that each instance of std::unordered_set<uint_32> is still going to result in at least one more allocation on the heap (internally), the actually achievable savings have a hard bound. You are going to save less than 50% of the necessary heap allocations.

One important thing is that, once a node is created, it won't be deleted until the whole tree will be deleted.

Re-allocations are still happening in an std::vector as it grows. In contrast to immovable allocations on the heap, this additionally involves moving already allocated nodes. Depending on what other attributes your node elements have, that may be even costlier. You will also need to make sure that your node class is compatible with move semantics.

With std::unordered_set also allocating on the heap, you won't be able to avoid the resulting memory fragmentation either.

All in all, unless you expect to be limited by the performance of heap allocations, or you have a use case for order-independent tree traversal, compacting node instances is likely not going to be worth the effort.


A potentially reasonable optimization would be to swap std::unordered_set<node*> for std::unordered_set<node>. Unless you need that indirection in that place (external pointers?), that saves you just as much as your own attempt, without sacrificing debug capabilities.

Implementing the necessary hash function and equality operator for node is usually easy.


If performance is a real concern, and you also have a (reasonable) upper bound on the number of children per node, you may consider replacing std::unordered_set<uint_32> children by std::array<uint_32, n> children instead with use of std::find() for detecting collisions.

The reason for that is quite simple, ensuring that node becomes TriviallyCopyable which reduces re-allocations to a plain memcpy internally. For up to 16 children or so, this is likely going to be neutral in terms of memory consumption, but still faster.

2 Comments

My topic here is not stack vs heap allocation. It is related to the generation of a huge amount of small chunks into the heap vs allocation of one huge chunk (with reallocations, internal container data managements etc).
@Jacques You are still not going to be able to save more than 50% of the allocations. And allocating a single huge chunk isn't going to save as much as you would hope for either, if you can't also eliminate the fragmentation resulting from the unavoidable smaller ones.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.