2

We are running a JDK17 spring-boot application on our production server with following configuration:

  • JDK Vendor : Amazon corretto (17.0.6)
  • K8S version : 1.17
  • Max pod memory : 5GB
  • Min pod memory : 5GB
  • Xmx : 2GB
  • Xms : 2GB

The problem we are running into is that in every 24hr, the application is getting OOM Killed by K8S (exit code 137). Few observations till now:

  • There is no leak in heap memory, it's confired from multiple heap dumps and gc logs as well.
  • No leak is observed in native memory as well with native memory dump. The max reserved seen in native memory dump is 3.5GB. Dump has been taken in periodic intervals, no increase in any non-heap area is observed.
  • We have checked that RSS size of the application increases gradually to 5GB when the OOM kill happens.

We have tried to tweak around XMX/XMS and few other GC parameters (disabling adaptive IHOP and all), but nothing helped till now. Possibly there is some leak and same is visible in growing RSS, but same is not reflecting on native dump.

2
  • For nodes running Linux Kernel 5.0 or later (ex. Ubuntu 18.04.3) there is a known issue where kubelet falsely reports pod cgroup OOM as a system OOM. To verify whether it was a false positive, check the node's kernel log for 'oom-kill' messages. If you see 'memcg=/kubepods/' then it is a standard container OOM and can be disregarded as a system OOM. The fix is expected to be implemented in K8s 1.21. Commented Aug 14, 2023 at 7:50
  • Apart from above comment , please go throught this Article Java Container crashes with “Error 137 (out of memory)”, which may help to resolve your issue. Commented Aug 18, 2023 at 6:20

1 Answer 1

0

Xmx will control the maximum heap size - but that is not the only memory region a JVM manages (see e.g. here or here). If memory dumps do not reveal which memory region is growing that much, consider a bug in the JVM itself.

To verify, tweak more parameters or switch to another JVM implementation.

Sign up to request clarification or add additional context in comments.

2 Comments

I am not able to figure out why RSS is increasing if nothing unusual shows up either in native/heap dump. How do I figure what else is hogging process memory?
If a memory dump do not reveal more information, consider the JVM to be buggy.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.