I've set a udp socket and call sendto() with a different recipient at each call.
I would like to use writev() in order to benefit scater/gather io but writev() does not allows me to specify the recipient addr/port as in sendto(). Any suggestions?
On Linux, there is sendmmsg(2)
The sendmmsg() system call is an extension of sendmsg(2) that allows the caller to transmit multiple messages on a socket using a single system call. (This has performance benefits for some applications.)
The prototype is:
int sendmmsg(int sockfd, struct mmsghdr *msgvec, unsigned int vlen,
unsigned int flags);
struct mmsghdr {
struct msghdr msg_hdr; /* Message header */
unsigned int msg_len; /* Number of bytes transmitted */
};
Since both the address and the i/o vector is specified in struct msghdr, you can both send to multiple destinations and make use of scatter/gather.
You can use writev to send a coalesced set of buffers to a single end point if you use connect to specify the end point beforehand. From the (OSX) manpage for connect(2):
datagram sockets may use connect() multiple times to change their association
You cannot use writev to send each buffer to a different endpoint.
A potential downside of using connect / writev instead of sendto*n is that it is yet another system call per writev.
If the set of recipients is limited (and known in advance) it may be preferable to use a separate socket per recipient and just connect each socket once.
Related
I'm reading "Kafka The Definitive Guide", in page 35 (networking section) it says :
... The first adjustment is to change the default and maximum amount of
memory allocated for the send and receive buffers for each socket. This will significantly increase performance for large transfers. The relevant parameters for the send and receive buffer default size per socket are net.core.wmem_default and
net.core.rmem_default......
In addition to the socket settings, the send and receive buffer sizes for TCP sockets must be set separately using the net.ipv4.tcp_wmem and net.ipv4.tcp_rmem parameters.
why we should set both net.core.wmem and net.ipv4.tcp_wmem?
Short answer - r/wmem_default are used for setting static socket buffer sizes, while tcp_r/wmem are used for controlling TCP send/receive window size and buffers dynamically.
More details:
By tracking the usages of r/wmem_default and tcp_r/wmem (kernel 4.14) we can see that r/wmem_default are only used in sock_init_data():
void sock_init_data(struct socket *sock, struct sock *sk)
{
sk_init_common(sk);
...
sk->sk_rcvbuf = sysctl_rmem_default;
sk->sk_sndbuf = sysctl_wmem_default;
This initializes the socket's buffers for sending and receiving packets and might be later overridden in set_sockopt:
int sock_setsockopt(struct socket *sock, int level, int optname,
char __user *optval, unsigned int optlen)
{
struct sock *sk = sock->sk;
...
sk->sk_sndbuf = max_t(int, val * 2, SOCK_MIN_SNDBUF);
...
sk->sk_rcvbuf = max_t(int, val * 2, SOCK_MIN_RCVBUF);
Usages of tcp_rmem are found in these functions: tcp_select_initial_window() in tcp_output.c and __tcp_grow_window(), tcp_fixup_rcvbuf(), tcp_clamp_window() and tcp_rcv_space_adjust() in tcp_input.c. In all usages this value is used for controlling the receive window and/or the socket's receive buffer dynamically, meaning it would take the current traffic and the system parameters into consideration.
A similar search for tcp_wmem show that it is only used for dynamic changes in the socket's send buffer in tcp_init_sock() (tcp.c) and tcp_sndbuf_expand() (tcp_input.c).
So when you want the kernel to better tune your traffic, the most important values are tcp_r/wmem. The Socket's size is usually overridden by the user the default value doesn't really matter. For exact tuning operations, try reading the comments in tcp_input.c marked as "tuning". There's a lot of valuable information there.
Hope this helps.
is it possible to obtain socket ID in linux kernel in sk_buff struct?
I know i could get socket using this code:
const struct tcphdr *th = tcp_hdr(skb);
struct sock *sk = __inet_lookup_skb(&tcp_hashinfo, skb, th->source, th->dest);
if (sk)
struct socket* = sk->sk_socket;
Where could i find ID and what maximum value of this id?
A socket is a file.
You'll find, inside the struct socket, a struct file *file member.
I recommend you to look at this question, specifically the link "things you never should do in the Kernel" on the accepted answer, because I'm worried about the reason why you're trying to retrieve the file descriptor from a socket structure in the kernel (usually, you want to do the exact opposite).
To retrieve the file descriptor from a given file under the kernel, you'll need to iterate the fdtable (search for files_fdtable())... this is a tremendous amount of work to do, specially if there is a huge amount of open files.
The maximum value for a file descriptor value will be the maximum number of files allowed in the system, and can be retrieved with something like:
files_fdtable(current->files)->max_fds;
I have a network client which is stuck in recvfrom a server not under my control which, after 24+ hours, is probably never going to respond. The program has processed a great deal of data, so I don't want to kill it; I want it to abandon the current connection and proceed. (It will do so correctly if recvfrom returns EOF or -1.) I have already tried several different programs that purport to be able to disconnect stale TCP channels by forging RSTs (tcpkill, cutter, killcx); none had any effect, the program remained stuck in recvfrom. I have also tried taking the network interface down; again, no effect.
It seems to me that there really should be a way to force a disconnect at the socket-API level without forging network packets. I do not mind horrible hacks, up to and including poking kernel data structures by hand; this is a disaster-recovery situation. Any suggestions?
(For clarity, the TCP channel at issue here is in ESTABLISHED state according to lsof.)
I do not mind horrible hacks
That's all you have to say. I am guessing the tools you tried didn't work because they sniff traffic to get an acceptable ACK number to kill the connection. Without traffic flowing they have no way to get hold of it.
Here are things you can try:
Probe all the sequence numbers
Where those tools failed you can still do it. Make a simple python script and with scapy, for each sequence number send a RST segment with the correct 4-tuple (ports and addresses). There's at most 4 billion (actually fewer assuming a decent window - you can find out the window for free using ss -i).
Make a kernel module to get hold of the socket
Make a kernel module getting a list of TCP sockets: look for sk_nulls_for_each(sk, node, &tcp_hashinfo.ehash[i].chain)
Identify your victim sk
At this point you intimately have access to your socket. So
You can call tcp_reset or tcp_disconnect on it. You won't be able to call tcp_reset directly (since it doesn't have EXPORT_SYMBOL) but you should be able to mimic it: most of the functions it calls are exported
Or you can get the expected ACK number from tcp_sk(sk) and directly forge a RST packet with scapy
Here is function I use to print established sockets - I scrounged bits and pieces from the kernel to make it some time ago:
#include <net/inet_hashtables.h>
#define NIPQUAD(addr) \
((unsigned char *)&addr)[0], \
((unsigned char *)&addr)[1], \
((unsigned char *)&addr)[2], \
((unsigned char *)&addr)[3]
#define NIPQUAD_FMT "%u.%u.%u.%u"
extern struct inet_hashinfo tcp_hashinfo;
/* Decides whether a bucket has any sockets in it. */
static inline bool empty_bucket(int i)
{
return hlist_nulls_empty(&tcp_hashinfo.ehash[i].chain);
}
void print_tcp_socks(void)
{
int i = 0;
struct inet_sock *inet;
/* Walk hash array and lock each if not empty. */
printk("Established ---\n");
for (i = 0; i <= tcp_hashinfo.ehash_mask; i++) {
struct sock *sk;
struct hlist_nulls_node *node;
spinlock_t *lock = inet_ehash_lockp(&tcp_hashinfo, i);
/* Lockless fast path for the common case of empty buckets */
if (empty_bucket(i))
continue;
spin_lock_bh(lock);
sk_nulls_for_each(sk, node, &tcp_hashinfo.ehash[i].chain) {
if (sk->sk_family != PF_INET)
continue;
inet = inet_sk(sk);
printk(NIPQUAD_FMT":%hu ---> " NIPQUAD_FMT
":%hu\n", NIPQUAD(inet->inet_saddr),
ntohs(inet->inet_sport), NIPQUAD(inet->inet_daddr),
ntohs(inet->inet_dport));
}
spin_unlock_bh(lock);
}
}
You should be able to pop this into a simple "Hello World" module and after insmoding it, in dmesg you will see sockets (much like ss or netstat).
I understand that what you want to do it's to automatize the process to make a test. But if you just want to check the correct handling of the recvfrom error, you could attach with the GDB and close the fd with close() call.
Here you could see an example.
Another option is to use scapy for crafting propper RST packets (which is not in your list). This is the way I tested the connections RST in a bridged system (IMHO is the best option), you could also implement a graceful shutdown.
Here an example of the scapy script.
On Linux, unless I'm mistaken, an application can use the socket call family to send or receive one packet at a time on datagram transports.
Would like to know if Linux provides a means for the application to send and receive multiple packets in a single call on datagram transports.
Use recvmmsg to receive multiple datagram packets (example UDP)
int recvmmsg(int sockfd, struct mmsghdr *msgvec, unsigned int vlen,
unsigned int flags, struct timespec *timeout);
DESCRIPTION
The recvmmsg() system call is an extension of recvmsg(2) that allows
the caller to receive multiple messages from a socket using a single
system call. ...
http://man7.org/linux/man-pages/man2/recvmmsg.2.html
Use sendmmsg to send...
int sendmmsg(int sockfd, struct mmsghdr *msgvec, unsigned int vlen,
unsigned int flags);
DESCRIPTION
The sendmmsg() system call is an extension of sendmsg(2) that allows
the caller to transmit multiple messages on a socket using a single
system call.
http://man7.org/linux/man-pages/man2/sendmmsg.2.html
There is no such call on Linux. However, depending what you need, there are alternatives:
Both sendmsg() and recvmsg() can do "scatter/gather" - send/receive a single packet from multiple buffers.
"pktgen" ( http://www.linuxfoundation.org/collaborate/workgroups/networking/pktgen ) is a kernel module which can transmit millions of packets with just a handful of file writes for the testing purposes.
Can anyone explain what this line exactly does:
socketcall(7,255);
I know, that the command is opening a port on the system, but I don't understand the parameter.
the man-page says
int socketcall(int call, unsigned long *args);
DESCRIPTION
socketcall() is a common kernel entry point for the socket system calls. call determines which socket function to invoke. args points to a block con-
taining the actual arguments, which are passed through to the appropriate call.
User programs should call the appropriate functions by their usual names. Only standard library implementors and kernel hackers need to know about
socketcall().
Ok, call 7 is sys_getpeername, but if I take a look in the man-page:
int getpeername(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
DESCRIPTION
getpeername() returns the address of the peer connected to the socket sockfd, in the buffer pointed to by addr. The addrlen argument should be initial-
ized to indicate the amount of space pointed to by addr. On return it contains the actual size of the name returned (in bytes). The name is truncated
if the buffer provided is too small.
The returned address is truncated if the buffer provided is too small; in this case, addrlen will return a value greater than was supplied to the call.
I really don't get it. The function needs 3 parameter. how did the function get the parameter? what means the 255? has anyone an idea how the function is opening a port?
Although Linux has a system call that is commonly called socketcall, the C library does not expose any C function with that name. Normally the standard wrapper functions such as socket() and getpeername() should be used, which will end up calling the system call, but if for some reason it is necessary to call the system call directly then that can be done with syscall(SYS_socketcall, call, args) or using assembly.
In this case the application or a library that it uses (other than the standard C library) has most likely defined its own function called socketcall(), that is unrelated to the system call. You should check that function or its documentation to see what it does.