How to read a sparse Octree sequentially, like a 3D array?

Question

Is it possible to read a sparse Octree like a regular 3D grid:

for(z)
  for(y)
    for(x)

...and still benefit from skipping large empty areas?

I need this for a rolling cache, that always retains corner data from previous cells. This is not possible with regular Octree traversal as it goes deep in every octant and overfills the cache needed for sequential octants. But is possible with sequential grid traversal.

Caching example:

ReuseCell& get_reuse_cell(Int3_64 pos) {
  uint32 j = pos.Z & 1;
  uint32 i = pos.Y * maxX + pos.X;      
  return cache[j][i];
}
...

Array<std::vector<ReuseCell>, FixedAllocation<2>> cache;

...

struct ReuseCell {
    Array<uint32, FixedAllocation<4>> vertices;
    ...
};
```

Engineer · Accepted Answer · 2021-09-16 14:52:37Z

To skip all those unwanted nodes, I think what you're looking for is a sparse but contiguously-allocated octree, which can be easily indexed by (x, y, z). You also seem to want sequential ordering.

For this, with a maximal tree i.e. what you call a regular tree, we use something like Morton ordering to rapidly and sequentially index into the 1D array representing the tree, with just a few arithmetic or bitshift operations. This works fine when we've allocated the full tree, because Morton assumes the full range of indices for that size of tree, are available in the array.

However, once we shorten the tree array to only as many nodes as we actually use (say 40%, but even 99%), then Morton indexing throws array out-of-bounds errors. At first, this apppears to be a catch-22 situation. Good news: it's really not.

What we have to do is to build our tree into a maximal tree structure first, one that can be Morton-indexed. Then extract all relevant nodes and do some array index indirection.

While building the maximal tree, we mark which nodes were actually used (versus just being "present"). Keep track of the Morton indices to those nodes, in a separate vector / list (vector.push_back()). This is the reverse-mapped LUT in array form via vector.data().
Create the forward-mapped lookup table (maybe std::unordered_map<uint32, uint32>) which maps the Morton index (which we will use to access using (x,y,z)) to the index used to store the node in the compact array. This is simply a reversal of the keys to value, and values to keys, from the vector above.

We then read / write our sparse / compact tree array as follows (assume a quadtree for simplicity):

//...up here we do reverse and forward mapping...

Node sparseTree = new Node[nodesUsed];

uint32 mortonIndex = pos.Y * maxX + pos.X;  
uint32 sparseIndex = mortonToSparse[mortonIndex]; //forward LUT
Node node = sparseTree[sparseIndex];

Note on performance

Iterating a 1D array using Morton order (or Hilbert curves) is one of the fastest possible data access arrangements since it preserves locality of data for CPU cache.

I never use double- or triple-nested loops to iterate (x,y,z). Even if you set constant upper bounds on those loops, the nesting alone has me question whether the compiler will unroll these loops - and if they remain nested, walks can potentially be very slow due to conditional branches.

Rather always iterate in 1D over the tree array:

for (uint32 sparseIndex = 0; sparseIndex < sparseTreeLength; sparseIndex++)
{
    uint32 mortonIndex = sparseToMorton[sparseIndex]; //reverse LUT
    
    uint32 x = mortonIndex % maxX;
    uint32 y = mortonIndex / maxX; //maybe use floor()
    
    ...
}

You've just unravelled the octree into a grid and then traversed the grid the usual way? — Stack Exchange Broke The Law
– Stack Exchange Broke The Law, Commented Sep 16, 2021 at 16:13
'maximal tree structure' > this isn't very efficient memory wise and processing wise (2 big loops instead of 1). — trshmanx
– trshmanx, Commented Sep 17, 2021 at 6:51
@trshmanx I see now that this is nothing like the solution you are looking for. — Engineer
– Engineer, Commented Sep 17, 2021 at 14:52

DMGregory · Accepted Answer · 2021-09-17 01:35:13Z

One strategy you can use is to make your leaf nodes "chunks" of cells. For example, Minecraft uses 16x16x16 cell chunks if I recall correctly.

This gives you a coarser granularity for caching and a slightly shallower octree to manage, while keeping most of the benefits of skipping large empty areas. The chunks can also map to individual meshes/blocks of vertex data that you can LOD or cull as a whole for rendering efficiency.

Next, make your cache an array of references to say 2x2x4 adjacent chunks, acting as a sliding window into your octree. You can fill this cache with the corner of the start of your search with 2 octree traversals, taking all up-to-8 leaf nodes from the final parent cell into your cache in each traversal in one fell swoop.

Following Engineer's lead, I'll demonstrate this with a 2D analogue:

You can access all 30x30x60 cells on the interior this window, along with all of their neighbours along the outer faces in case you need to do any adjacency-based logic, using only the chunks in cache - no need to go back to the octree for each adjacent lookup. Iterating down your row in a 30x30 "pipe" like this will also give you better data locality than scanning along the entire width of the octree in a single-cell line then looping back to visit the same neighborhoods again but 1 cell over.

When you want to shift this sliding window down the row as your iteration proceeds, you need just 1 additional octree traversal to bring the next 2x2x2 chunks into cache, rather than 32x32x32 traversals to look up individual cells. You can fast-forward this window sliding past swaths of empty chunks.

Because you're doing so few full octree traversals, you can also cache the ancestor chain leading to the latest search, and start your next search from lower in the chain if you like.

However, I think you might find that using an algorithm designed for iterating over data in octree format natively might work better in the long run than trying to shoehorn it into a sequential access model. To help you find such an algorithm, we'd need to know what you're currently using this sequential iteration for.

Comments are not for extended discussion; this conversation has been moved to chat. — Almo
– Almo ♦, Commented Sep 16, 2021 at 18:47

Engineer · Accepted Answer · 2021-09-17 17:32:34Z

Re your updated question. Since per-layer rolling cache may not work for an octree... and caching is not actually necessary to solve the problem which is efficient access to vertices...

Try a Morton or Hilbert-ordered, sparse spatial hashmap of vertex. This should give you amortised O(1) reads across all non-co-located vertices. It should also eliminate the need for caching.

Buckets constituting this hash should correspond to spatial (x,y,z) locality within your octree. Thus, unlike the regular std::*maps, as you read in a local volume within the tree, you will typically also be reading within the same bucket (or 4-8 local buckets if you are reading on octree boundary), which thereby roughly approximates the rolling / per-layer cache in the original Transvoxel concept (at least, it will be far better for random access and depth-first walks) - if these buckets exist near one another within a single contiguously allocated structure, you're golden. Read more here:

...for a normal hash table, a good hash function distributes keys as evenly as possible across the available buckets, in an effort to keep lookup time short. The result of this is that keys which are very close (lexicographically speaking) to each other, are likely to end up in distant buckets. But in a spatial hash we are dealing with locations in space, and locality is very important to us (especially for collision detection), so our hash function will not change the distribution of the inputs.

This concept is well-known in GIS applications, so you may wish to search in GIS libraries for a suitable C++ implementation. (I think my fellow users and I have already speculated enough on your questions, so best you seek the correct implementation for yourself.) You may also want to look into perfect spatial hashing.

Just a thought, it doesn't really matter that much how large the spatial hash is: What matters is the bucket size and their proximity to one another within the larger hashmap allocation.

Another benefit to such a map is that implicitly disallows duplicate vertices at the same location, since the key is formed from the (x,y,z) coordinates. Thus only one instance of vertex can by keyed at each location.

Depthwise caching

You could optimise by selecting a maximum node size to cache within, and temporarily caching all those vertices within a smaller hash (potentially?) as you begin walking down the tree. But I doubt this would gain you much. I would strongly suggest the simplest solution first (above) and see how that performs for you before taking additional steps to optimise - since that's all caching is.

I would then suggest doing this chunk-wise, as DMGregory suggests - one octree per chunk, allowing for a sliding window.

Could you elaborate on this? Maybe with some code snippets and a)b)c) plan please? I read your text twice and it just doesn't ring the bell for me :(. Especially how will I reuse vertices and what do I store in map? — trshmanx
– trshmanx, Commented Sep 17, 2021 at 15:45

Stack Exchange Network

How to read a sparse Octree sequentially, like a 3D array?

3 Answers 3

You must log in to answer this question.

Linked

Hot Network Questions

How to read a sparse Octree sequentially, like a 3D array?

3 Answers 3

You must log in to answer this question.

Linked

Related

Hot Network Questions