I know that I can get socket send/recv buffer size via getsockopt.
Is there also a way to retrieve capacity of MSG_ERRQUEUE? I assume that technically it is possible to overflow it by for example TX timestamps
Related
I am writing an application that sends parallel ICMP packets, and receives them. To help with the parallelism and synchronization, I have designed multiple writers (and sockets), and a single reader.
Let's say I have 256 writers and one reader. This means I created 257 raw sockets. From what I learned, because raw sockets work lower than the transport level, kernel copies every response from the recipients to all raw sockets. Even though I am able to filter or discard them, I don't want the 256 writer sockets to receive all this data from the kernel and spend unnecessary resources (imagine more writers). I don't know if lot's of raw sockets are a burden for the kernel, couldn't find any information about that, so I could also use help in that direction.
I wanted to prevent the writer raw sockets from receiving any data, even though filling their buffer up and let the kernel drop packets is an option.
What didn't help me:
close vs shutdown socket? (my research shows shutdown doesn't work with connectionless sockets)
create SOCK_RAW socket just for sending data without any recvform() (decreasing the receive buffer size to 0 doesn't seem to create the desired effect, also it is mentioned in the unix documentations the minimum is 256 bytes. The goal is to prevent kernel from ever consider the writer sockets for received data)
I am trying to develop a socket congestion algorithm in diameter stack by comparing the socket send buffer size[default max size] and the actual bytes available in send buffer queue.
getsockopt(ainfo->socket,SOL_SOCKET,SO_SNDBUF,(void *)&n, &m); // Getting the max default size
retval = ioctl(ainfo->socket,TIOCOUTQ,&bytes_available); // Getting the actual bytes available
Testing scenario:
Start the client and server.
Once handshake is successful, start the traffic from client to server.
Block the packets at server's end using iptables.
Check the netstat output and check the send buffer size with help of ss command.
The send buffer size get stuck after sometime at a certain number. Example size of send buffer is 87090 [tb as shown in ss output]. The Send Q is stuck at some random number which is much smaller than the tb [for example : 54344]. Sometimes it increases till 135920. Ideally it should reach somewhere around 80k and then get stuck. Can somewhen explain me this unusual behavior ?
Any help is appreciated.
Thanks!
While I was trying to implement benchmark testware using netperf I happened to read its manual. Where I got this query
In the TCP_STREAM specific test there are an option to mention -s and -S to specify local(netperf client), remote(netperf server) socket buffer sizes respectively. Is that a regular BSD socket size? There is also an option to specify the local send message size -m and remote receive message size -M; Is this the total message size after all TCP/IP encapsulation? Can anybody throw some light on this. It would be great if you can illustrate using a use-case why we need these separate parameters as the BSD socket size appears to be the upper boundary here.
The socket buffer sizes (set via -s and -S) will control how much data may be outstanding on the connection at one time by affecting either the receiver's advertised window (which will be based on the SO_SNDBUF) or how much data the sender can hold waiting for ACKnowledgement (which will be based on SO_SNDBUF).
The send and receive message sizes (-m and -M) control how much data is presented in any one "send" (-m) or requested in any one "recv" (-M) call.
As TCP is a streaming protocol, it is perfectly legal/possible to make a send call with a number of bytes larger than the socket buffer(s). When the socket is blocking (as netperf uses) it simply means the send call will remain there until the last of its bytes have been put into the send socket buffer. On the receive side, one can as for more than a socket buffer's worth of data in a single receive, but the semantics are such that the call will return with however many bytes happen to be there at the time if there are any, and will return with however many bytes arrive if the socket buffer was empty at the time of the call (again because netperf uses blocking sockets/calls).
I'm reading "Kafka The Definitive Guide", in page 35 (networking section) it says :
... The first adjustment is to change the default and maximum amount of
memory allocated for the send and receive buffers for each socket. This will significantly increase performance for large transfers. The relevant parameters for the send and receive buffer default size per socket are net.core.wmem_default and
net.core.rmem_default......
In addition to the socket settings, the send and receive buffer sizes for TCP sockets must be set separately using the net.ipv4.tcp_wmem and net.ipv4.tcp_rmem parameters.
why we should set both net.core.wmem and net.ipv4.tcp_wmem?
Short answer - r/wmem_default are used for setting static socket buffer sizes, while tcp_r/wmem are used for controlling TCP send/receive window size and buffers dynamically.
More details:
By tracking the usages of r/wmem_default and tcp_r/wmem (kernel 4.14) we can see that r/wmem_default are only used in sock_init_data():
void sock_init_data(struct socket *sock, struct sock *sk)
{
sk_init_common(sk);
...
sk->sk_rcvbuf = sysctl_rmem_default;
sk->sk_sndbuf = sysctl_wmem_default;
This initializes the socket's buffers for sending and receiving packets and might be later overridden in set_sockopt:
int sock_setsockopt(struct socket *sock, int level, int optname,
char __user *optval, unsigned int optlen)
{
struct sock *sk = sock->sk;
...
sk->sk_sndbuf = max_t(int, val * 2, SOCK_MIN_SNDBUF);
...
sk->sk_rcvbuf = max_t(int, val * 2, SOCK_MIN_RCVBUF);
Usages of tcp_rmem are found in these functions: tcp_select_initial_window() in tcp_output.c and __tcp_grow_window(), tcp_fixup_rcvbuf(), tcp_clamp_window() and tcp_rcv_space_adjust() in tcp_input.c. In all usages this value is used for controlling the receive window and/or the socket's receive buffer dynamically, meaning it would take the current traffic and the system parameters into consideration.
A similar search for tcp_wmem show that it is only used for dynamic changes in the socket's send buffer in tcp_init_sock() (tcp.c) and tcp_sndbuf_expand() (tcp_input.c).
So when you want the kernel to better tune your traffic, the most important values are tcp_r/wmem. The Socket's size is usually overridden by the user the default value doesn't really matter. For exact tuning operations, try reading the comments in tcp_input.c marked as "tuning". There's a lot of valuable information there.
Hope this helps.
Is it possible to query how many bytes are in a socket's send buffer in linux? I'd like to be able to query SO_SNDBUF with getsockopt to get the buffer size and then [insert technique here] to get the actual usage, which will let me know how much I'm filling up the buffer.
That's not what SO_SNDBUF does. SO_SNDBUF sets or gets the maximum socket send buffer in bytes (quoting socket(7)). You could probably use the SIOCOUTQ or TIOCOUTQ ioctls if you're using tcp or udp.
However, it's highly unlikely this is the right approach. Have you considered using a select-like mechanism to notify you when a socket is writable ? Combined with nonblocking behavior it could be the ticket to a clean approach.