Sources of latency in sending-receiving tcp/udp packets in linux

Question

What are sources of latency in process of sending/receiving TCP/UDP packets in linux 2.6 ?

I want to know a latency sources in "ping-pong" latency tests.

There are some rather good papers of ethernet latency, but they cover only latency sources in the wire and switch (and rather cursory, only for specific switch).

What steps of processing does follow a packet?

Papers with deep latency analysis of usual ping (icmp) will be useful too.

I rely on community :)

Tgilgul · Accepted Answer · 2017-11-25 13:55:43Z

Short Answer: for exact latency in the kernel you should use perf probe and perf script.

More details: Let's look at the following example. First we want to see what functions are used for TCP ping-pong test (I'm using netperf).

On receive flow:

On transmit flow:

So let's track some functions of the transmit flow (it is long so I'll show the major functions in the TCP flow). We can use perf probe in order to sample entry and exit point for each function:

perf probe --add sock_sendmsg='sock_sendmsg'
perf probe --add sock_sendmsg_exit='sock_sendmsg%return'
perf probe --add inet_sendmsg='inet_sendmsg'
perf probe --add inet_sendmsg_exit='inet_sendmsg%return'
perf probe --add tcp_sendmsg_exit='tcp_sendmsg%return'
perf probe --add tcp_sendmsg='tcp_sendmsg'
perf probe --add tcp_sendmsg_locked='tcp_sendmsg_locked'
perf probe --add tcp_sendmsg_locked_exit='tcp_sendmsg_locked%return'
perf probe --add sk_stream_alloc_skb='sk_stream_alloc_skb'
perf probe --add sk_stream_alloc_skb_exit='sk_stream_alloc_skb%return'
perf probe --add tcp_push_exit='tcp_push%return'
perf probe --add tcp_push='tcp_push'
perf probe --add tcp_send_mss='tcp_send_mss'
perf probe --add tcp_send_mss_exit='tcp_send_mss%return'
perf probe --add __tcp_push_pending_frames='__tcp_push_pending_frames'
perf probe --add __tcp_push_pending_frames_exit='__tcp_push_pending_frames%return'
perf probe --add tcp_write_xmit_exit='tcp_write_xmit%return'
perf probe --add tcp_transmit_skb_exit='tcp_transmit_skb%return'
perf probe --add tcp_transmit_skb='tcp_transmit_skb'

No we can record these:

perf record -e probe:* -aR taskset -c 7 netperf -t TCP_RR -l 5 -T 7,7

And run perf script for the latency report:

perf script -F time,event --ns

Output (1 iteration):

525987.403094082:                   probe:sock_sendmsg:
525987.403095586:                   probe:inet_sendmsg:
525987.403096192:                    probe:tcp_sendmsg:
525987.403098203:             probe:tcp_sendmsg_locked:
525987.403099296:                   probe:tcp_send_mss:
525987.403100002:              probe:tcp_send_mss_exit:
525987.403100740:            probe:sk_stream_alloc_skb:
525987.403101697:       probe:sk_stream_alloc_skb_exit:
525987.403103079:                       probe:tcp_push:
525987.403104284:      probe:__tcp_push_pending_frames:
525987.403105575:               probe:tcp_transmit_skb:
525987.403110178:               probe:tcp_transmit_skb:
525987.403111640:          probe:tcp_transmit_skb_exit:
525987.403112876:          probe:tcp_transmit_skb_exit:
525987.403114351:            probe:tcp_write_xmit_exit:
525987.403114768: probe:__tcp_push_pending_frames_exit:
525987.403115191:                  probe:tcp_push_exit:
525987.403115718:        probe:tcp_sendmsg_locked_exit:
525987.403117576:               probe:tcp_sendmsg_exit:
525987.403118082:              probe:inet_sendmsg_exit:
525987.403118568:              probe:sock_sendmsg_exit:

Now it is pretty easy to see where the latency is spent. For instance we can notice that between sock_sendmsg() call and inet_sendmsg() call there are 1504 nanoseconds (ns) or 1.504 microseconds (us) of latency. Also, we can see sk_stream_alloc_skb takes 957 ns. In total, the whole process (sock_sendmsg entry to exit) takes ~24.5us. Keep in mind this is not what you would see in netperf since the packet is physically transmitted somewhere in the middle of the flow.

You can use the same method to track any piece of code in the kernel.

Hope this helps.

P.S. This was done on kernel-4.14 not 2.6. Not sure how developed perf was back then, so it might not work as well.

Thanks! It is ok to use modern kernels now. How did you get the flamegraphs? Classic flamegraph as I know uses "sum", it is not draw of functions over absolute timeline, but is draw of time percentages over several executions and is used with perf in profiling mode. So, you did flamegraph-like draw of perf script output with absolute timeline? How I will see latency of access to the hardware in the graphs (driver should create descriptor in hw/dma queue and signal new descriptor to the network controller over ports/pio/mmio)?
[1] These flame-graphs were created using this (excellent) manual: brendangregg.com/flamegraphs.html. It is perf based and requires recorded data in order to work. I always start performance debug with them because they perfectly illustrate what's happening on runtime. The rest of the debug (perf probe, record and script) is done on the next iteration, not by using the same data.[2] Regarding HW access, each driver has its own. For mlx5 driver TX this is: elixir.free-electrons.com/linux/latest/source/drivers/net/….

Arvid · Accepted Answer · 2017-08-28 07:36:55Z

4

Even though it's quite old, this guide details the linux networking stack.

http://www.cs.unh.edu/cnrg/people/gherrin/linux-net.html

If you read through there you should be able to spot places where latency can be added via the kernel.

e.g. if a frame is buffered temporarily in the kernel

edited Aug 28, 2017 at 7:36

Arvid

11.3k1 gold badge35 silver badges43 bronze badges

answered Nov 15, 2010 at 0:56

David Mokon Bond

1,6662 gold badges20 silver badges43 bronze badges

Collectives™ on Stack Overflow

Sources of latency in sending-receiving tcp/udp packets in linux

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related