6

I've a Java application with the configuration -Xmx4096m. The application itself is deployed in a k8s pod with a memory limit of: 8192Mi. After doing some analysis with the command jcmd 8 VM.native_memory summary the output (converted to MB) is as follows:

Native Memory Tracking:

Total: reserved=6009MB, committed=4797MB
-                 Java Heap (reserved=4096MB, committed=4094MB)
                            (mmap: reserved=4096MB, committed=4094MB)

-                     Class (reserved=1097MB, committed=82MB)
                            (classes #14238)
                            (malloc=3MB #35561)
                            (mmap: reserved=1094MB, committed=79MB)

-                    Thread (reserved=254MB, committed=254MB)
                            (thread #253)
                            (stack: reserved=253MB, committed=253MB)
                            (malloc=0.8MB #1506)
                            (arena=0.3MB #501)

-                      Code (reserved=254MB, committed=65MB)
                            (malloc=11MB #17281)
                            (mmap: reserved=244MB, committed=54MB)

-                        GC (reserved=226MB, committed=226MB)
                            (malloc=42MB #192857)
                            (mmap: reserved=184MB, committed=184MB)

-                  Compiler (reserved=1.1MB, committed=1.1MB)
                            (malloc=1MB #2175)
                            (arena=0.1MB #6)

-                  Internal (reserved=50MB, committed=50MB)
                            (malloc=50MB #178156)
                            (mmap: reserved=0.03MB, committed=0.03MB)

-                    Symbol (reserved=16MB, committed=16MB)
                            (malloc=13MB #129700)
                            (arena=3MB #1)

-    Native Memory Tracking (reserved=8.5MB, committed=8.5MB)
                            (malloc=0.03MB #339)
                            (tracking overhead=8.5MB)

-               Arena Chunk (reserved=0.2MB, committed=0.2MB)
                            (malloc=0.2MB)

-                   Unknown (reserved=8MB, committed=0MB)
                            (mmap: reserved=8MB, committed=0MB)

This never increases but the top command shows 7.9 GB for the java process memory usage. And then the appications gets killed when it reaches the 8GB limit with a:

Last State:     Terminated
Reason:       OOMKilled

So, my question is where this memory is going? Meaning, which should be the next debugging steps to find the root cause of the issue?

13
  • 3
    Do you use any native libraries thru JNI? Commented Jan 28 at 7:50
  • 2
    Which jvm and version are you using? Commented Jan 28 at 7:52
  • 1
    @Joan Again, which version are you using? openjdk 8 is too unspecific. During the lifetime of java 8 much changed for container support for java. Also which are all the jvm options you are using? Commented Jan 28 at 15:10
  • 1
    @davidalayachew already checked it nothing new there :( Commented Feb 13 at 8:51
  • 1
    wild guess: you should check at the OS level, not at the level of the JVM process. Found this quite old article stackoverflow.com/q/131303/2846138 that lists several options how you could check if the 6009MB reported by the tool in your question matches with the amount of memory the OS notes down for your JVM process(es). Commented Feb 13 at 13:26

2 Answers 2

1

This looks terribly similar to issues I encountered with containerized JVM's. The container would OOME regularly after some time under heavy load. This exclusively happened in containers.

I upgraded the JVM to the then latest LTS version and used the -XX:+UseContainerSupport JVM parameter (introduced in Java 10) to alleviate the problem.

NB: This is of course assuming that on a physical box with 8G RAM the application runs without issues. That the host atop of which the conainer runs does not run out of RAM for some other reason. Troubleshooting ̶O̶O̶M̶E̶`̶s̶ OOMKilled's is complex.

Update:

You can troubleshoot this with pmap, run it over time and see which elements increase in size. Another thing you could monitor is the output of ls /proc/$pid/fd | wc -l (where $pid is the java process PID) and check if it increases over time. This will count all open files, sockets as well as a number of other resources used by the java process.

You could create a loop on the shell, to sleep 5 minutes and issue the commands:

while true
do
echo $(date)
kubectl exec <pod> -t -- pmap >> /tmp/pmaps
_pid=$(kubectl exec <pod> -t -- ps -ef | grep -m 1 java | awk {'print $2'})
kubectl exec <pod> -t -- ls /proc/$_pid/fd | wc -l >> /tmp/number_fd
sleep 600 #every 10 minutes
done

Come back a few hours later and you should have an idea of what increases. Ctrl+c to break the loop.

Sign up to request clarification or add additional context in comments.

2 Comments

This functionality (tuning the JVM Heap size based on cgroups v1 and later cgroups v2 memory limits etc.) was backported to Java 8 (several times, e.g.: baeldung.com/java-docker-jvm-heap-size and bugs.openjdk.org/browse/JDK-8307634 ), which only tries to help set sensible defaults based on the cgroup/container limits instead of the maximum amount of physical memory. However the user already manages this with -Xmx4096m so why would it matter?
@JohannesB indeed, nice, also, to know it has been backported, thanks. I read your comment and will update my answer.
1

If you get OOMKilled despite NMT showing more available memory, you could:

  • Limit Direct Memory: -XX:MaxDirectMemorySize=512m
  • Reduce Stack Size: -Xss512k
  • Enable GC Logging: -Xlog:gc*:file=/tmp/gc.log:time,level,tags

The following script can assist in issue analysis:

#!/usr/bin/env bash

# Set Java process ID (auto-detect if only one Java process is running)
PID=$(pgrep -f "java" | head -n 1)
if [[ -z "$PID" ]]; then
    echo "No Java process found. Exiting."
    exit 1
fi

echo "Analyzing memory usage for Java process ID: $PID"

echo_section() { echo -e "\n\e[1;34m[$1]\e[0m"; }

echo_section "1. Off-Heap Memory (Direct Buffers, JNI, Metaspace)"
jcmd $PID GC.class_stats | grep -i direct
jcmd $PID VM.system_properties | grep java.nio

echo_section "2. Thread Count and Stack Size"
jcmd $PID Thread.print | grep "java.lang.Thread.State" | wc -l

echo_section "3. JVM Pointer & Metadata Overhead"
jcmd $PID VM.info | grep -i UseCompressedOops

echo_section "4. Kubernetes cgroup Memory Usage"
if [[ -f /sys/fs/cgroup/memory/memory.usage_in_bytes ]]; then
    cat /sys/fs/cgroup/memory/memory.usage_in_bytes
else
    echo "cgroup memory file not found. Are you inside a container?"
fi

echo_section "5 Check Native Memory Leaks (Libraries, Malloc)"
lsof -p $PID | grep deleted
pmap $PID | sort -k2 -nr | head -20

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.