High CPU usage on UDP DatagramSocket threads in Java

Question

I'm running a multi-threaded java server application, that, among other things, is receiving UDP packets from 3 different multicast sources (ports), on 3 different threads.

It's running on a recent dual-socket redhat box (total of 8 cores (4 x 2 cpu), no hyperthreading).

The "top" command shows cpu usage at 250~300%. shift-H shows 2 threads at around 99% usage, 1 at 70%. A quick thread jstack analysis shows those threads correspond to my UDP handling threads.

I am a bit surprised by the level of the CPU usage considering the CPU speed vs the UDP message rate (about 300 msg/second, for a payload of about 250 bytes), and I'm investigating this. It's interesting to note that the third thread (corresponding to a lower cpu usage) has a lower data rate (50~100 msg/s)

I've included some debug code to measure where most time is spent, and it appears to be in the "receive()" method of the DatagramSocket:

_running    = true;
_buf        = new byte[300];
_packet     = new DatagramPacket(_buf, _buf.length);

while(_running) {
    try {
        long t0 = System.nanoTime();
        _inSocket.receive(_packet);
        long t1 = System.nanoTime();
        this.handle(_packet);
        long t2 = System.nanoTime();
        long waitingAndReceiveTime = t1-t0;
        long handleTime = t2-t1;
        _logger.info("{} : {} : update : {} : {}", t1, _port, waitingAndReceiveTime, handleTime);
    }
    catch(Exception e) {
        _logger.error("Exception while receiving multicast packet", e);
    }
}

handleTime averages at 4000ns, which is extremely fast and can not be responsible for the CPU usage. waitingAndReceiveTime is much higher, from around 30,000ns to several ms. I understand the method is blocking, so the time includes both the time blocking, and the time receiving.

I have several questions:

am I right to suspect something is strange ?
I'm thinking as "receive()" is blocking, it should not "waste" CPU cycles, so the waiting part should not be responsible for the high CPU usage, right ?
would there be a way to split the measurement of the time blocking, and the time receiving the datagram in the receive method ?
what could be responsible for this high CPU usage ?

EDIT: I played with Interrupt Coalescing parameters, putting rx-usecs at 0 and rx-frames at 10. I can now see the following:

UPD messages indeed appear in groups of 10: for each group the first message has a LONG waitingAndReceiveTime (>= 1ms), and the following 9 waitingAndReceiveTime is much shorter (~2000ns). (handleTime is the same)
CPU usage is reduced ! goes down to about 55% for the 2 first threads.

still no idea how to solve this

it does simple handling on the packet: basically parsing the string, and a few computations - nothing too taxing, as demonstrated by the low 4000ns time spent in it on average — Bastien
– Bastien, Commented Apr 4, 2018 at 5:40
try profiling docs.oracle.com/javase/8/docs/technotes/guides/visualvm/… — Scary Wombat
– Scary Wombat, Commented Apr 4, 2018 at 5:45
1. You're right to suspect something is strange. 2. Yes, receive() is blocking, so it shouldn't be possible for it to take the CPU up to 100%. 4. I'm suspecting that it's not Java code that's responsible for (all of) the usage, but native code (or some significant Java code that was left out of the question). — Kayaman
– Kayaman, Commented Apr 4, 2018 at 5:48
as per the javadoc: "This method blocks until a datagram is received" - I can investigate further, but that'd be quite misleading... — Bastien
– Bastien, Commented Apr 5, 2018 at 1:44

Albos Hajdari · Accepted Answer · 2018-04-19 11:28:36Z

NOT REALLY AN ANSWER BUT:

One thing I can assure you, it's not the Java code. I did a multithreaded UDP server in Python and it does the same thing, the CPU usage jumps to 100% within 3 to 4 seconds. I'm guessing it really has something to do with the UDP itself, since I've also made a multithreaded TCP server and it barely reaches 10% of the CPU usage.

Here's the code:

import socket
from _thread import*
import threading
import time
def threaded(s,serverIP,serverPort):
    while True:
        try:
            d = s.recvfrom(128)
            data = d[0]
            addr = d[1]
            message= str(data)
            if (message== "b'1'"):
                time.sleep(5)
            s.sendto(str.encode(message) , addr)
            print(message)
        except:
            break
    s.close()

def Main():
    serverPort = 11000
    serverIP= "127.0.0.1"
    s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    s.bind((serverIP, serverPort))

    while True:
        start_new_thread(threaded, (s,serverIP,serverPort))
    s.close)

if __name__ == '__main__':
    Main()

Note:

If you found the answer, please tell me. Good luck.

Franci Kopač · Accepted Answer · 2023-02-16 19:42:20Z

0

I have been beating myself over the head about this for 11 years now. I wrote a home automation app that uses UDP to talk to the individual room controllers. It wastes a full processor core (100% processor usage) to receive UDP datagrams no matter what I do. I went and did a full profiling investigation and it looks like it's just how UDP receive is implemented in Java. I have since updated Java untold times (it's been literally 11 years) and lately also migrated it from PC to a Raspberry Pi 4. Even on Raspberry, it still uses one whole core just to receive UDP packets... It might be a simple bug that stayed undetected for all these years or maybe there is a reason why you can't receive UDP packets more efficiently. I'm not a proper developer, so I would not dare to post a proper bug report to Oracle.

edited Feb 16, 2023 at 19:42

answered Feb 16, 2023 at 19:42

Franci Kopač

11 bronze badge

4 Comments

Andreas Violaris Over a year ago

This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. To get notified when this question gets new answers, you can follow this question. Once you have enough reputation, you can also add a bounty to draw more attention to this question. - From Review

Franci Kopač Over a year ago

@AndreasViolaris: It is an answer, but not a solution. I'll try to recap it, if it helps: The OPs problem looks like a result of a bug/suboptimal implementation in Java, which has been there a LONG time and seems to be platform independent.

Andreas Violaris Over a year ago

In the Late Answers review queue, it's suggested to recommend deletion of answers that fail to address the issues at hand. During my review, I noticed that your answer contained speculations without evidence to support them. Based on this, I feel that your answer falls under this category and was rightly flagged for deletion. While your willingness to help is highly appreciated, Stack Overflow heavily relies on evidence-based solutions and is not intended as a discussion forum. Therefore, it's important to provide well-reasoned answers that are based on facts, rather than mere speculations.

Andreas Violaris Over a year ago

In any case, my suggestion was just that, a suggestion, and it's completely acceptable to have a different viewpoint. Ultimately, it's up to the moderators to assess whether your answer complies with the platform's guidelines and decide whether it should be deleted or not.

Collectives™ on Stack Overflow

High CPU usage on UDP DatagramSocket threads in Java

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related