Specifying UDP receive buffer size at runtime in Linux - linux

In Linux, one can specify the system's default receive buffer size for network packets, say UDP, using the following commands:
sysctl -w net.core.rmem_max=<value>
sysctl -w net.core.rmem_default=<value>
But I wonder, is it possible for an application (say, in c) to override system's defaults by specifying the receive buffer size per UDP socket in runtime?

You can increase the value from the default, but you can't increase it beyond the maximum value. Use setsockopt to change the SO_RCVBUF option:
int n = 1024 * 1024;
if (setsockopt(socket, SOL_SOCKET, SO_RCVBUF, &n, sizeof(n)) == -1) {
// deal with failure, or ignore if you can live with the default size
}
Note that this is the portable solution; it should work on any POSIX platform for increasing the receive buffer size. Linux has had autotuning for a while now (since 2.6.7, and with reasonable maximum buffer sizes since 2.6.17), which automatically adjusts the receive buffer size based on load. On kernels with autotuning, it is recommended that you not set the receive buffer size using setsockopt, as that will disable the kernel's autotuning. Using setsockopt to adjust the buffer size may still be necessary on other platforms, however.

Related

how to tune linux network buffer size

I'm reading "Kafka The Definitive Guide", in page 35 (networking section) it says :
... The first adjustment is to change the default and maximum amount of
memory allocated for the send and receive buffers for each socket. This will significantly increase performance for large transfers. The relevant parameters for the send and receive buffer default size per socket are net.core.wmem_default and
net.core.rmem_default......
In addition to the socket settings, the send and receive buffer sizes for TCP sockets must be set separately using the net.ipv4.tcp_wmem and net.ipv4.tcp_rmem parameters.
why we should set both net.core.wmem and net.ipv4.tcp_wmem?
Short answer - r/wmem_default are used for setting static socket buffer sizes, while tcp_r/wmem are used for controlling TCP send/receive window size and buffers dynamically.
More details:
By tracking the usages of r/wmem_default and tcp_r/wmem (kernel 4.14) we can see that r/wmem_default are only used in sock_init_data():
void sock_init_data(struct socket *sock, struct sock *sk)
{
sk_init_common(sk);
...
sk->sk_rcvbuf = sysctl_rmem_default;
sk->sk_sndbuf = sysctl_wmem_default;
This initializes the socket's buffers for sending and receiving packets and might be later overridden in set_sockopt:
int sock_setsockopt(struct socket *sock, int level, int optname,
char __user *optval, unsigned int optlen)
{
struct sock *sk = sock->sk;
...
sk->sk_sndbuf = max_t(int, val * 2, SOCK_MIN_SNDBUF);
...
sk->sk_rcvbuf = max_t(int, val * 2, SOCK_MIN_RCVBUF);
Usages of tcp_rmem are found in these functions: tcp_select_initial_window() in tcp_output.c and __tcp_grow_window(), tcp_fixup_rcvbuf(), tcp_clamp_window() and tcp_rcv_space_adjust() in tcp_input.c. In all usages this value is used for controlling the receive window and/or the socket's receive buffer dynamically, meaning it would take the current traffic and the system parameters into consideration.
A similar search for tcp_wmem show that it is only used for dynamic changes in the socket's send buffer in tcp_init_sock() (tcp.c) and tcp_sndbuf_expand() (tcp_input.c).
So when you want the kernel to better tune your traffic, the most important values are tcp_r/wmem. The Socket's size is usually overridden by the user the default value doesn't really matter. For exact tuning operations, try reading the comments in tcp_input.c marked as "tuning". There's a lot of valuable information there.
Hope this helps.

How to set the maximum TCP receive window size in Linux?

I want to limit the rate of every TCP connection. Can I set the maximum TCP receive window size in Linux?
With iptables + tc can only limit IP packets. The parameters net.core.rmem_max and net.core.wmem_max didn't not work well.
man tcp:
Linux supports RFC 1323 TCP high performance extensions. These include Protection Against Wrapped Sequence Numbers (PAWS), Window Scaling and Timestamps. Window scaling allows the use of large (> 64K) TCP windows in order to support links with high latency or bandwidth. To make use of them, the send and receive buffer sizes must be increased. They can be set globally with the /proc/sys/net/ipv4/tcp_wmem and /proc/sys/net/ipv4/tcp_rmem files, or on individual sockets by using the SO_SNDBUF and SO_RCVBUF socket options with the setsockopt(2) call.

TCP receiving window size higher than net.core.rmem_max

I am running iperf measurements between two servers, connected through 10Gbit link. I am trying to correlate the maximum window size that I observe with the system configuration parameters.
In particular, I have observed that the maximum window size is 3 MiB. However, I cannot find the corresponding values in the system files.
By running sysctl -a I get the following values:
net.ipv4.tcp_rmem = 4096 87380 6291456
net.core.rmem_max = 212992
The first value tells us that the maximum receiver window size is 6 MiB. However, TCP tends to allocate twice the requested size, so the maximum receiver window size should be 3 MiB, exactly as I have measured it. From man tcp:
Note that TCP actually allocates twice the size of the buffer requested in the setsockopt(2) call, and so a succeeding getsockopt(2) call will not return the same size of buffer as requested in the setsockopt(2) call. TCP uses the extra space for administrative purposes and internal kernel structures, and the /proc file values reflect the larger sizes compared to the actual TCP windows.
However, the second value, net.core.rmem_max, states that the maximum receiver window size cannot be more than 208 KiB. And this is supposed to be the hard limit, according to man tcp:
tcp_rmem
max: the maximum size of the receive buffer used by each TCP socket. This value does not override the global net.core.rmem_max. This is not used to limit the size of the receive buffer declared using SO_RCVBUF on a socket.
So, how come and I observe a maximum window size larger than the one specified in net.core.rmem_max?
NB: I have also calculated the Bandwidth-Latency product: window_size = Bandwidth x RTT which is about 3 MiB (10 Gbps # 2 msec RTT), thus verifying my traffic capture.
A quick search turned up:
https://github.com/torvalds/linux/blob/4e5448a31d73d0e944b7adb9049438a09bc332cb/net/ipv4/tcp_output.c
in void tcp_select_initial_window()
if (wscale_ok) {
/* Set window scaling on max possible window
* See RFC1323 for an explanation of the limit to 14
*/
space = max_t(u32, sysctl_tcp_rmem[2], sysctl_rmem_max);
space = min_t(u32, space, *window_clamp);
while (space > 65535 && (*rcv_wscale) < 14) {
space >>= 1;
(*rcv_wscale)++;
}
}
max_t takes the higher value of the arguments. So the bigger value takes precedence here.
One other reference to sysctl_rmem_max is made where it is used to limit the argument to SO_RCVBUF (in net/core/sock.c).
All other tcp code refers to sysctl_tcp_rmem only.
So without looking deeper into the code you can conclude that a bigger net.ipv4.tcp_rmem will override net.core.rmem_max in all cases except when setting SO_RCVBUF (whose check can be bypassed using SO_RCVBUFFORCE)
net.ipv4.tcp_rmem takes precedence net.core.rmem_max according to https://serverfault.com/questions/734920/difference-between-net-core-rmem-max-and-net-ipv4-tcp-rmem:
It seems that the tcp-setting will take precendence over the common max setting
But I agree with what you say, this seems to conflict with what's written in man tcp, and I can reproduce your findings. Maybe the documentation is wrong? Please find out and comment!

How to find the socket buffer size of linux

What's the default socket buffer size of linux? Is there any command to see it?
If you want see your buffer size in terminal, you can take a look at:
/proc/sys/net/ipv4/tcp_rmem (for read)
/proc/sys/net/ipv4/tcp_wmem (for write)
They contain three numbers, which are minimum, default and maximum memory size values (in byte), respectively.
For getting the buffer size in c/c++ program the following is the flow
int n;
unsigned int m = sizeof(n);
int fdsocket;
fdsocket = socket(AF_INET,SOCK_DGRAM,IPPROTO_UDP); // example
getsockopt(fdsocket,SOL_SOCKET,SO_RCVBUF,(void *)&n, &m);
// now the variable n will have the socket size
Whilst, as has been pointed out, it is possible to see the current default socket buffer sizes in /proc, it is also possible to check them using sysctl (Note: Whilst the name includes ipv4 these sizes also apply to ipv6 sockets - the ipv6 tcp_v6_init_sock() code just calls the ipv4 tcp_init_sock() function):
sysctl net.ipv4.tcp_rmem
sysctl net.ipv4.tcp_wmem
However, the default socket buffers are just set when the sock is initialised but the kernel then dynamically sizes them (unless set using setsockopt() with SO_SNDBUF). The actual size of the buffers for currently open sockets may be inspected using the ss command (part of the iproute/iproute2 package), which can also provide a bunch more info on sockets like congestion control parameter etc. E.g. To list the currently open TCP (t option) sockets and associated memory (m) information:
ss -tm
Here's some example output:
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 192.168.56.102:ssh 192.168.56.1:56328
skmem:(r0,rb369280,t0,tb87040,f0,w0,o0,bl0,d0)
Here's a brief explanation of skmem (socket memory) - for more info you'll need to look at the kernel sources (i.e. sock.h):
r:sk_rmem_alloc
rb:sk_rcvbuf # current receive buffer size
t:sk_wmem_alloc
tb:sk_sndbuf # current transmit buffer size
f:sk_forward_alloc
w:sk_wmem_queued # persistent transmit queue size
o:sk_omem_alloc
bl:sk_backlog
d:sk_drops
I'm still trying to piece together the details, but to add to the answers already given, these are some of the important commands:
cat /proc/sys/net/ipv4/udp_mem
cat /proc/sys/net/core/rmem_max
cat /proc/sys/net/ipv4/tcp_rmem
cat /proc/sys/net/ipv4/tcp_wmem
ss -m # see `man ss`
References & help pages:
Man pages
man 7 socket
man 7 udp
man 7 tcp
man ss
https://www.linux.org/threads/how-to-calculate-tcp-socket-memory-usage.32059/
Atomic size is 4096 bytes, max size is 65536 bytes. Sendfile uses 16 pipes each of 4096 bytes size.
cmd : ioctl(fd, FIONREAD, &buff_size).

What is NetBSD's FIONSPACE ioctl equivalent in Linux?

I'm using Linux 2.6.38 (fc14). What is the ioctl flag to get the amount of free space on a socket file descriptor (say, a TCP socket)? I found NetBSD has FIONREAD, FIONWRITE and FIONSPACE for such related purposes. But, I could only use FIONREAD in Linux.
SIOCOUTQ is the Linux equivalent of FIONWRITE. I don't believe there is a direct FIONSPACE equivalent: instead, you can subtract the value returned by SIOCOUTQ from the socket send buffer size, which can be obtained with getsockopt(s, SOL_SOCKET, SO_SNDBUF, ...).
For information, about what #HKK says, found in man socket(7):
SO_SNDBUF
Sets or gets the maximum socket send buffer in bytes. The
kernel doubles this value (to allow space for bookkeeping overhead)
when it is set using setsockopt(2), and this doubled value is
returned by getsockopt(2). The default value is set by the
/proc/sys/net/core/wmem_default file and the maximum allowed value is
set by the /proc/sys/net/core/wmem_max file. The minimum (doubled)
value for this option is 2048.

Resources