If I have a server written in golang that processes requests containing user queries, is there any viable method to track and limit the memory used by any given user request?

The server process does not have a simple way to predict the memory required to service a given user query. Memory demands may depend on the shape and quantity of the data the query needs to process and is not knowable before execution begins. So actual memory use must be tracked, allocated to specific subtasks relating to a particular request. The memory accounting could then be used for over quota tasks to cooperatively abort.

To do this some mechanism is required to efficiently track which tasks particular allocations are associated with, query and aggregate them.

For example, say my server is limited to 1 GiB of RAM in a containerised context. Some user queries may only need 50MiB of RAN to execute. Others may need 500, others would need 3000. My goal is to efficiently determine when (a) any given user query has allocated more than X MiB in private memory associated specifically with its task and (b) detect when all tasks are collectively approaching a cumulative resource limit.

The goal is to prevent OOMs and maintain service robustness without having to grossly over-allocate memory for the worst possible case.

GOMEMLIMIT is helpful but insufficient as it is a soft limit and is process wide.

I'm looking for things like:

  • Tagging objects with a owner with transitive propagation of the tag
  • Allocation of objects into dedicated memory pools or arenas while still using the GC (not implementing my own bad malloc in giant byte arrays)
  • Heirachical contextual allocators where all allocations done by a particular thread/goroutine count toward a particular memory bucket

A lazy, cooperative approach is fine. Memory stats may be somewhat delayed (e.g. since last GC sweep) and somewhat inaccurate. If I needed hard limits I would probably split the process into one shot workers for each query. Run each in its own short lived control group. But this is excessively expensive and inefficient especially with a runtime like go.

As far as I can tell nothing like this exists for go, nor does a program have any efficient way to determine an allocation's real size or to be notified when the object is GC'd. So there doesn't seem to be anything like this.

As far as I can see a go program can't really track objects memory manually without runtime assistance. In addition to the prohibitive overhead and concurrency impacts of attempting it, the program would need to impose the same allocation management logic on every library it used. And it can't always see the real memory usage of its objects due to interning, backing arrays of slices, etc. It's be impossibly slow and clumsy.

(My immediate use case is helping Thanos and Prometheus run in lower memory caps by letting them soft limit query memory use, allowing them to abort queries that use took much memory instead of OOMing. Coarse drained limits like series and sample counts are ineffective and require significant over allocation for worst case input data).

6 Replies 6

The answer is a simple „No“. The language offers nothing here, you have to Design your own way of signaling resource needs back to where you can reject accepting the request.

In a project experiencing high load here at my $dayjob, we've implemented a special type of object (dubbed "gradual throttle" here) which tracks abstract resource usage (though for the task hand it estimates memory usage). Basically, a client's PUT request (for clarity) is handled as follows: its body size is taken, and that many bytes are "requested" from the throttle for that particular client. Then, if the body is compressed, its target size is estimated, and more memory is requested. And so on - for each processing stage, we estimate the amount of memory it will need and request it. As soon as the throttle fails a resource allocation request, we abort the client's request. The throttle even has a mechanism which allows it to see one or more client requests are about to terminate, so it may tolerate immediate shortage of its allotted pool of resources and not fail the immediate request it otherwise might fail.

@kostix Interesting approach. Unfortunately for me the underlying data means the request cost is very poorly correlated to the request size. A short query like group by (_name_) ({_name_=~".+"})` may cause enormous memory use throughout the stack due to various unfortunate implementation details in the Prometheus/Thanos stack. Whereas a very long and complex query that touches only lower-cardinality data series may use a fraction of the memory.

Effective estimation would require the stack to have a much more sophisticated query planner and a means of quickly and efficiently estimating the cardinality of selectors in a request. This would add considerable load and latency into requests, and isn't really supported by the storage engines and protocols in use anyway.

That's why a reactive model is probably necessary here. If golang supported tagging objects into memory pools, supported placement-allocation into dedicated regions, etc, there would be some options (albeit at some cost). But the runtime seems designed around the idea that memory is ambient and effectively infinite. (This is also frustrating when analysing memory use, because there's no way to tag allocations to associate them with a particular program subroutine in pprof heap profiles etc; it only keeps track of memory by allocating call-path).

OK, I see, makes sense.

Maybe you can try to explore what there is to it among more exotic solutions. Note that while Go does not have support for stuff like „tagging objects into memory pools, <…> placement-allocation into dedicated regions” you've mentioned, there's still some wiggle room — namely, each time your own code is about to allocate an object (usually indirectly, but in Go, you always know where this happens or might happen), you may use something which will perform custom allocation of the said object. For instance, you can start at the ill-fated #51317 and explore what people say in there and what links to is. Of particular interest is the PR 12788 to Graphana which referes to an interesting piece of code using pools to unpack protobuf packets. Another interesting example off the top of my mind is github.com/cockroachdb/swiss which was the basis for the new Swiss Tables implementation of the maps in Go 1.24; this package implements a Go map "in userspace", and it can be configured to use custom allocator.

Of course, all such stuff does not affect any of the stdlib code and the code of any 3rd-patry packages, but maybe you could use something like this for your own memory-usage-critical code, of maybe this would give you some spark for your imagination to explore this field further ;-)

Regarding „<…> frustrating when analysing memory use, because there's no way to tag allocations to associate them with a particular program subroutine in pprof heap profiles <…>” — can't you try to use profiler labels?

@kostix Profiler labels are, unfortunately (at time of writing), not supported for memory analysis. If they were it'd solve 90% of the problem, and be absolutely wonderful.

The docs aren't too clear on this either. A while ago I spent a fair while trying to work out what I was doing wrong with my test case for profiler labels before finding out that they're simply ignored for memory profiling.

Your Reply

By clicking “Post Your Reply”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.