1

I am writing a container that uses alloca internally to allocate data on the stack. Risks of using alloca aside, assume that I must use it for the domain I am in (it's partly a learning exercise around alloca and partly to investigate possible implementations of dynamically-sized stack-allocated containers).

According to the man page for alloca (emphasis mine) :

The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller.

Using implementation-specific features, I have managed to force inlining in such a way that the callers stack is used for this function-level "scoping".

However, that means that the following code will allocate a huge amount of memory on the stack (compiler optimisations aside):

for(auto iteration : range(0, 10000)) {
    // the ctor parameter is the number of
    // instances of T to allocate on the stack,
    // it's not normally known at compile-time
    my_container<T> instance(32);
}

Without knowing the implementation details of this container, one might expect any memory it allocates to be free'd when instance goes out of scope. This is not the case and can result in a stack overflow / high memory usage for the duration of the enclosing function.

One approach that came to mind was to explicitly free the memory in the destructor. Short of reverse engineering the resulting assembly, I haven't found a way of doing that yet (also see this).

The only other approach I have thought of is to have a maximum size specified at compile-time, use that to allocate a fixed-size buffer, have the real size specified at runtime and use the fixed-size buffer internally. The issue with this is that it's potentially very wasteful (suppose your maximum were 256 bytes per container, but you only needed 32 most of the time).

Hence this question; I want to find a way to provide these scope semantics to the users of this container. Non-portable is fine, so long as it's reliable on the platform its targeting (for example, some documented compiler extension that only works for x86_64 is fine).

I appreciate this could be an XY problem, so let me restate my goals clearly:

  • I am writing a container that must always allocate its memory on the stack (to the best of my knowledge, this rules out C VLAs).
  • The size of the container is not known at compile-time.
  • I would like to maintain the semantics of the memory as if it were held by an std::unique_ptr inside of the container.
  • Whilst the container must have a C++ API, using compiler extensions from C is fine.
  • The code need only work on x86_64 for now.
  • The target operating system can be Linux-based or Windows, it doesn't need to work on both.
13
  • 1
    The statement "container that must always allocate its memory on the stack" does not compute, insofar as C++ goes. The container itself may be allocated on the stack (automatic scope) or the heap (dynamic scope), which is controlled entirely by whatever instantiates the container. But the container itself has absolutely no influence on that, whatsoever. Perhaps you're asking how to declare a class that can only be declared in automatic scope. This cannot be done in C++. Commented Mar 18, 2018 at 2:07
  • You could write an allocator based on alloca instead of sbrk like you'd normally do with malloc Commented Mar 18, 2018 at 2:08
  • 3
    Space allocated on the stack is freed when the function returns. Since that isn't what you want, why are you determined that you want to allocate space on the stack? Commented Mar 18, 2018 at 2:13
  • 2
    @SamVarshavchik: The container could be allocated on a pile of £20 notes as far as C++ cares. Commented Mar 18, 2018 at 2:16
  • 1
    @LightnessRacesinOrbit I like the sound of that Commented Mar 18, 2018 at 2:20

1 Answer 1

3

I am writing a container that must always allocate its memory on the stack (to the best of my knowledge, this rules out C VLAs).

The normal implementation of C VLAs in most compilers is on the stack. Of course ISO C++ doesn't say anything about how automatic storage is implemented under the hood, but it's (nearly?) universal for C implementations on normal machines (that do have a call+data stack) to use that for all automatic storage including VLAs.

If your VLA is too large, you get a stack overflow rather than a fallback to malloc / free.

Neither C nor C++ specify alloca; it's only available on implementations that have a stack like "normal" machines, i.e. the same machines where you can expect VLAs to do what you want.

All of these conditions hold for all the major compilers on x86-64 (except that MSVC doesn't support VLAs).


If you have a C++ compiler that supports C99 VLAs (like GNU C++), smart compilers may reuse the same stack memory for a VLA with loop scope.


have a maximum size specified at compile-time, use that to allocate a fixed-size buffer ... wasteful

For a special case like you mention, you could maybe have a fixed-size buffer as part of the object (size as a template param), and use that if it's big enough. If not, dynamically allocate. Maybe use a pointer member to point to either the internal or external buffer, and a flag to remember whether to delete it or not in the destructor. (You need to avoid delete on an array that's part of the object, of course.)

// optionally static_assert (! (internalsize & (internalsize-1), "internalsize not a power of 2")
// if you do anything that's easier with a power of 2 size
template <type T, size_t internalsize>
class my_container {
    T *data;
    T internaldata[internalsize];
    unsigned used_size;
    int allocated_size;   // intended for small containers: use int instead of size_t
    // bool needs_delete;     // negative allocated size means internal
}

The allocated_size only needs to be checked when it grows, so I made it signed int so we can overload it instead of needing an extra boolean member.

Normally a container uses 3 pointers instead of pointer + 2 integers, but if you don't grow/shrink often then we save space (on x86-64 where int is 32 bits and pointers are 64-bit), and allow this overloading.

A container that grows large enough to need dynamic allocation should continue using that space but then shrinks should keep using the dynamic space, so it's cheaper to grow again, and to avoid copying back into the internal storage. Unless the caller uses a function to release unused excess storage, then copy back.

A move constructor should probably keep allocation as-is, but a copy constructor should copy into the internal buffer if possible instead of allocating new dynamic storage.

Sign up to request clarification or add additional context in comments.

7 Comments

Looks like my understanding of VLAs was off, as you and David C pointed out. I thought it'd fallback to heap allocation. I just ran some tests locally and on my system at least, it behaves as you described. Are there any guarantees of this?
@OMGtechy: Guarantee that the memory can be reclaimed / reused as soon as it goes out of scope? No, I'd only expect reuse if it's the same VLA inside a loop. Compilers have to know how to do that at -O0 to make sane code for loops containing declarations of regular array and scalar variables, and it's no different for VLAs. But if you had 2 different scopes containing 2 different VLAs (even if they had the same size) you might or might not find the compiler optimizes by reusing the space, and maybe only -O2. Even if it's both sides of an if else, so they can't both be used in one trip.
How could you use a VLA to implement a container though? The VLA will go out of scope when the constructor terminates. Or are there C++ extensions that allow VLAs as members or something like that?
I guess you could use a macro to expand your my_container definition into a VLA local helper object in the same scope and a call to the my_container constructor which takes a pointer to the VLA and uses it as storage.
@BeeOnRope: However the OP was doing it with alloca, which is also deallocated on leaving the constructor scope. (I think they mentioned macros in the question.) The first section of the answer is just to explain how VLAs do work, not how you can actually use them to solve this problem. Problems like that are why I wrote the 2nd part of the answer, suggesting a workaround that gives you normal C++ objects but avoids dynamic allocation for small sizes. (i.e. mostly solving the same problem without variable-size stack allocation).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.