TCP Series — TCP Receive Buffer and Receive Window

Brunda K
Brunda’s Tech Notes
4 min readOct 16, 2023

--

In this blog post, I’d like to delve into understanding the meaning and difference of the two terms — TCP receive buffer and receive window.

Is there a difference between the two ? How do we obtain these values ? Read on to find out.

TCP receive buffer holds TCP data that has not yet been processed (consumed via read/recv system calls) by the application.

If we were to call the total available space in the receive buffer as Available and the size that is already filled up as used, then a fraction of the remaining bytes of memory is referred to as the TCP receive window. This memory is available for the incoming TCP segments. The receive window is a parameter that is part of the TCP header and is communicated to the peer. This parameter is dynamic in nature that can shrink (if the TCP receiver application is not processing data fast enough) and it can grow as needed and the growth is determined by the OS.

Communicating the size of the TCP receive window to the peer lets the peer know how much data it can send before the TCP receive buffer becomes full. This lets the peer adjust the size of the data it sends.

The receive buffer is allocated per socket. Utilities like netstat and ss provide the memory usage of a socket (example: ss -m). ss uses the term recv-q to refer to TCP data that has not yet been read by the L7 application. In addition to the TCP data, there is some additional metadata that the Linux Kernel maintains that resides in the receive buffer as well. ss shows unread TCP data (recv-q) + metadata as skmem_r. The current receive window is a fraction of the remaining free bytes available in the receive buffer. The entire free space is not advertised as current receive window, but only a fraction of the free space is advertised as the receive window. The total receive buffer length is show as skmem_rb by ss. It is not possible to obtain the current receive window via ss. It must be obtained by other utilities like tcpdump.

recv-q = TCP data yet to be read by the L7 application.

skmem_r = recv-q + metadata

skmem_rb = skmem_r + free_space

free_space = skmem_rb — skmem_r

current receive window = some fraction of the free space

The receive buffer length (shown as skmem_rb by ss) is determined by Linux’s auto tuning which takes into account various parameters (one of which is the rate at which the application is reading data). The maximum value that skmem_rb can reach up to is controlled by net.ipv4.tcp_rmem.

Example output from ss -mat command:
ESTAB      89701   0               127.0.0.1:9191             127.0.0.1:47176  
skmem:(r90981,rb131072,t0,tb2626560,f3227,w0,o0,bl0,d8)
This shows: skmem_r is 90981 bytes, skmem_rb is 131072

When Linux auto tuning is in place, the kernel determines the length of the receive buffer. Although, not recommended, it is possible to disable the auto tuning by setting SO_RCVBUF on a socket. If SO_RCVBUF is set to a value ‘x’, the kernel allocates 2x the value as the receive buffer. Using SO_RCVBUF disables auto tuning and is not recommended.

In order to understand how the TCP window size changes during a connection, I wrote a sample TCP server that does the following:

  • reads 16K bytes of data in a loop.
  • sends the data back to the client.

The server code is available at https://github.com/brkarana/examples/blob/master/tcpserver.c

The client is the nc utility that sends a 8 MB file. I used tcpdump to capture this communication.

Let’s look at the tcpdump snapshot to see how the TCP window size changes during the conversation.

Here,

  • nc runs on 127.0.0.1:54370
  • tcpServer runs on 127.0.0.1:7070

The first 3 lines show the TCP handshake with SYN, SYN-ACK and ACK packets exchanged. The column Calculated Window Size shows the receive window size used by both the client and the server.

tcpServer announces a window size of 65483 that becomes zero at 0.017 milliseconds. The TCP window Full message seen at row 16 is the sender (nc) determining that the receiver’s window (tcpServer) would become full by the data that it has sent. The sender has determined that the receiver’s window is 65536 as the receiver sent this information as seen in row #14. The sender sends two TCP segments of 32K each post this, which have not been acknowledged. This indicates that the receive buffer is now full at the receiver’s end. As expected, the receiver sends a TCP zero window message as seen in row #17. The receiver recovers quickly and indicates a TCP window update as seen in row #18.

Takeaways :

  • The unread TCP data and the memory advertised as the current receive window are both part of the same receive buffer.
  • The length of the receive buffer is determined by Linux’s auto tuning mechanism. It is not recommended to disable this.
  • The current receive window is dynamic and changes throughout the TCP session time and can become zero if there is no room left in the receive buffer. This is called TCP zero window.
  • If the receiver does not recover from a TCP zero window situation fast enough, the client would perceive some latency in receiving back a response.

--

--