2

I have a Python function that consumes a large amount of memory. When the function finishes, I want to release the memory and return the freed pages to the operating system by calling malloc_trim(0).

However, the memory isn’t released immediately — it only gets freed after a generation 2 garbage collection.

My first approach

I tried calling:

gc.collect()
malloc_trim(0)

manually after the function returns.

This works, but the Python documentation warns that:

The effect of calling gc.collect() while the interpreter is already performing a collection is undefined.

So this approach could potentially cause undefined behavior, which makes it unsafe for production use.

My second approach

Instead, I tried waiting for a natural gen=2 collection to occur, and then calling malloc_trim.

I did this by registering a GC callback:

def gc_trim_callback(phase, info):
    if phase == "stop" and info["generation"] == 2:
        malloc_trim(0)
        gc.callbacks.remove(gc_trim_callback)

gc.callbacks.append(gc_trim_callback)

This works correctly when a generation 2 collection actually occurs — the callback triggers and malloc_trim runs.

The problem

The issue is that sometimes, after the heavy function finishes, the process goes idle before any generation 2 collection happens.

If the user doesn’t perform any further actions, the process can stay idle for hours, days, or even months, holding onto the unused memory indefinitely. That’s not acceptable for my use case.

I’ve thought of two very bad solutions for triggering a generation 2 collection “naturally”:

  1. Adjusting the GC threshold using gc.set_threshold() to make gen2 collections happen more frequently — but this feels hacky and can negatively impact performance.

  2. Creating many artificial circular references after the function returns, then freeing them to trigger a gen2 collection — also very hacky and inefficient.

So, in short, I can't trigger a gen2 collection as it can cause invalid behaviour. I can make it happen by some manipulations, which i want to avoid doing. Can you think on a better solution then what i have?

New contributor
pixeldrift is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
6
  • Why Python version are you using? Non-standard configurations? Commented Nov 25 at 9:45
  • 1
    @cards python version is 3.12.3. there are no non-standard configurations Commented Nov 25 at 9:59
  • Could you please provide the code of that memory-consuming function? Commented Nov 25 at 10:52
  • 1
    You know sometimes the release of memory depends on what exactly doing your function and how does it manage closing objects. If you do not provide the details it is difficult for me to guess what is going wrong. Commented Nov 25 at 12:09
  • 1
    It may simply not be possible to return the memory to the OS. Memory is usually allocated to processes in large contiguous chunks, and can only be returned if the entire chunk is unused. Dynamic memory allocation often results in fragmentation that precludes this. Commented Nov 25 at 16:39

1 Answer 1

1

The documentation on gc.collect() saying 'is undefined' does not mean it is unsafe, but it means your request can be ignored. For history, see issue #49174.

I can think of two solutions. The first one is joining all background threads, deleting references to all heavy resources. If you don't keep cyclic references to heavy resources, you don't need to call gc.collect(). The second one is implementing a worker process. You can do it easily with the multiprocessing module.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.