I'm exploring some sample code of Aloop in the kernel-dir:
https://elixir.bootlin.com/linux/latest/source/sound/drivers/aloop.c
I want to put there my own function which gets as parameter a pointer to a buffer of bytes and respectively number of bytes (whatever is coming from the userspace, let's say I'm using mpg321 for playing MP3) and my function then SENDS that data through UDP kernel socket implemented in the same aloop kernel module.
Let's say I've my own callback function which accepts again pointer to a buffer of bytes and number of bytes which bytes again are being RECEIVED through UDP kernel socket in the same aloop kernel module.
The problem is that I cannot see in the code of aloop.c where (and how) to put my extra functionality - my function to send data and my callback which would provide the data to the userspace at the end...
Any recommendations and brainstorming on that are welcome!
Related
While I was trying to implement benchmark testware using netperf I happened to read its manual. Where I got this query
In the TCP_STREAM specific test there are an option to mention -s and -S to specify local(netperf client), remote(netperf server) socket buffer sizes respectively. Is that a regular BSD socket size? There is also an option to specify the local send message size -m and remote receive message size -M; Is this the total message size after all TCP/IP encapsulation? Can anybody throw some light on this. It would be great if you can illustrate using a use-case why we need these separate parameters as the BSD socket size appears to be the upper boundary here.
The socket buffer sizes (set via -s and -S) will control how much data may be outstanding on the connection at one time by affecting either the receiver's advertised window (which will be based on the SO_SNDBUF) or how much data the sender can hold waiting for ACKnowledgement (which will be based on SO_SNDBUF).
The send and receive message sizes (-m and -M) control how much data is presented in any one "send" (-m) or requested in any one "recv" (-M) call.
As TCP is a streaming protocol, it is perfectly legal/possible to make a send call with a number of bytes larger than the socket buffer(s). When the socket is blocking (as netperf uses) it simply means the send call will remain there until the last of its bytes have been put into the send socket buffer. On the receive side, one can as for more than a socket buffer's worth of data in a single receive, but the semantics are such that the call will return with however many bytes happen to be there at the time if there are any, and will return with however many bytes arrive if the socket buffer was empty at the time of the call (again because netperf uses blocking sockets/calls).
I have a driver that builds on the new serdev bus in the linux kernel.
In my driver I receive messages from an external device, all messages ends with a null byte (0x00) and the protocol ensures that there are no null bytes in my data (COBS). Now I try to have the TTY layer hand me full messages by scanning for zeros in my input and if there are none I'll just return zero in the callback that is called from the tty layer when bytes are available.
This kind of works. Or rather it works for some messages. After a while though it locks up and the tty layer keeps sending the same size of received bytes indefinitely. My guess is that this happens when one half of the tty flip buffer is full and the rest of my message is in the other half.
I have two questions:
Am I correct in that the tty layer can "hang" until I read out all data in one half of the flip buffer?
If that is so, is there some way to prevent this from happening? I'd rather not implement my own buffering scheme on top of the tty buffer already available.
Thanks
It looks like (drivers/tty/tty_buffer.c and the function flush_to_ldisc) that it is not possible to do what I attempted to do. When the tty buffer is about to flip over the consumer will have to do a read and buffer any half messages.
That is, returning zero and hoping for a larger chunk of data in your callback next time will only work up until the end of the first part of the buffer then the last bit of data must be read.
This is not a problem in userspace because a read call will have an argument that is the most bytes you want but read is free to return fewer bytes than requested.
I am writing an application server that processes images (large data). I am trying to minimize copies when sending image data back to clients. The processed images I need to send to clients are in buffers obtained from jemalloc. The ways I have thought of sending the data back to the client is:
1) Simple write call.
// Allocate buffer buf.
// Store image data in this buffer.
write(socket, buf, len);
2) I obtain the buffer through mmap instead of jemalloc, though I presume jemalloc already creates the buffer using mmap. I then make a simple call to write.
buf = mmap(file, len); // Imagine proper options.
// Store image data in this buffer.
write(socket, buf, len);
3) I obtain a buffer through mmap like before. I then use sendfile to send the data:
buf = mmap(in_fd, len); // Imagine proper options.
// Store image data in this buffer.
int rc;
rc = sendfile(out_fd, file, &offset, count);
// Deal with rc.
It seems like (1) and (2) will probably do the same thing given jemalloc probably allocates memory through mmap in the first place. I am not sure about (3) though. Will this really lead to any benefits? Figure 4 on this article on Linux zero-copy methods suggests that a further copy can be prevented using sendfile:
no data is copied into the socket buffer. Instead, only descriptors
with information about the whereabouts and length of the data are
appended to the socket buffer. The DMA engine passes data directly
from the kernel buffer to the protocol engine, thus eliminating the
remaining final copy.
This seems like a win if everything works out. I don't know if my mmaped buffer counts as a kernel buffer though. Also I don't know when it is safe to re-use this buffer. Since the fd and length is the only thing appended to the socket buffer, I assume that the kernel actually writes this data to the socket asynchronously. If it does what does the return from sendfile signify? How would I know when to re-use this buffer?
So my questions are:
What is the fastest way to write large buffers (images in my case) to a socket? The images are held in memory.
Is it a good idea to call sendfile on a mmapped file? If yes, what are the gotchas? Does this even lead to any wins?
It seems like my suspicions were correct. I got my information from this article. Quoting from it:
Also these network write system calls, including sendfile, might and
in many cases do return before the data sent over TCP by the method
call has been acknowledged. These methods return as soon as all data
is written into the socket buffers (sk buff) and is pushed to the TCP
write queue, the TCP engine can manage alone from that point on. In
other words at the time sendfile returns the last TCP send window is
not actually sent to the remote host but queued. In cases where
scatter-gather DMA is supported there is no seperate buffer which
holds these bytes, rather the buffers(sk buffs) just hold pointers to
the pages of OS buffer cache, where the contents of file is located.
This might lead to a race condition if we modify the content of the
file corresponding to the data in the last TCP send window as soon as
sendfile is returned. As a result TCP engine may send newly written
data to the remote host instead of what we originally intended to
send.
Provided the buffer from a mmapped file is even considered "DMA-able", seems like there is no way to know when it is safe to re-use it without an explicit acknowledgement (over the network) from the actual client. I might have to stick to simple write calls and incur the extra copy. There is a paper (also from the article) with more details.
Edit: This article on the splice call also shows the problems. Quoting it:
Be aware, when splicing data from a mmap'ed buffer to a network
socket, it is not possible to say when all data has been sent. Even if
splice() returns, the network stack may not have sent all data yet. So
reusing the buffer may overwrite unsent data.
For cases 1 and 2 - does the operation you marked as // Store image data in this buffer require any conversion? Is it just plain copy from the memory to buf?
If it's just plain copy, you can use write directly on the pointer obtained from jemalloc.
Assuming that img is a pointer obtained from jemalloc and size is a size of your image, just run following code:
int result;
int sent=0;
while(sent<size) {
result=write(socket,img+sent,size-sent);
if(result<0) {
/* error handling here */
break;
}
sent+=result;
}
It is working correctly for blocking I/O (the default behavior). If you need to write a data in a non-blocking manner, you should be able to rework the code on your own, but now you have the idea.
For case 3 - sendfile is for sending data from one descriptor to another. That means you can, for example, send data from file directly to tcp socket and you don't need to allocate any additional buffer. So, if the image you want to send to a client is in a file, just go for a sendfile. If you have it in memory (because you processed it somehow, or just generated), use the approach I mentioned earlier.
when the send(or write) buffer is going to be full, let me say, only theres is only 500 bytes space. if I have a NONBLOCKING fd, and do
n = send(fd, buf, 1000,0)
here I wll get n<0, and I can get EWOULDBLOCK or EAGAIN error. my questions are:
1 here, the send write 500 bytes into the send buffer or 0 bytes to the send buffer?
2 if 500 bytes are sent to the buffer and if the fd is a UDP socket, then the datagram is split into 2 parts?
3 I need to use the fd to send many datagrams, if this time the send buffer is full(if there is EWOULDBLOCK or EAGAIN error), I need to make a pending list of datagrams(a FIFO queue). And everytime I want to send some datagram, I will have to check the pending list to see whether it is empty or not. if it is not empty, send the datagram in the pending list first. It seems to me that this design is a bit troublesome. And the design is similar to extending the kernel(BTW, is it in the kernel?) send buffer by a userspace pending list. Is there better solution for this?
thanks!
The below only applies to UDP (and only in linux, though others are similar), but that seems to be what you're asking about.
Setting non-blocking mode on a UDP socket is completely irrelevant (for sending) as a send will never block -- it immediately sends the packet, without any buffering.
It IS possible (if the machine is very busy) for there to be a buffer space problem (run out of transient packet buffers for packet processing), but in that case call will return ENOBUFS, regardless of whether the fd is blocking or non-blocking. This should be extremely rare.
There's a potential problem if you're generating packets faster than the network can take them (fairly easy to do on a fast machine and a 10Mbit ethernet port), in which case the kernel will start dropping the outgoing packets. Unfortunately there's no easy way to detect when this happens (you can check the interface for TX dropped packets, but that won't tell you which packets were dropped).
Its also possible to have a problem if you use the UDP_CORK socket option, which buffers data written to the socket instead of sending a packet, and only sends a single packet when the CORK option is unset. In this case, if the buffer grows too big you'll get EMSGSIZE (and again, the NONBLOCKING setting is irrelevant).
If you are talking about UDP, you are completely off point here - for UDP the value of the SO_SNDBUF socket option puts a limit on the size of a datagram you can send. In other words, there's no real per-socket send buffer (though data is still queued in the kernel to be sent out by appropriate network controller). You would get EMSGSIZE if you try to send(2) more in one shot.
For TCP though, you would only get EWOULDBLOCK when there's no space in the send buffer at all, i.e. no data has been copied from user to the kernel. Otherwise sent(2)'s return value tells you exactly how many bytes have been copyed.
I am currently running an old system on Tru64 which involves lots of UDP sockets using the sendto() function. The sockets are used in our code to send messages to/from various processes and then eventually on to a thick client app that is connected remotely. Occasionally the socket to the thick client gets stuck, this can cause some of these messages to get built up. My question is how can I determine the current buffer size, and how do I determine the maximum message buffer. The code below gives a snippet of how I set up the port and use the sendto function.
/* need to adjust the maximum size we can send on this */
/* as it needs to be able to cope with the biggest */
/* messages we send */
lenlen = sizeof(len) ;
/* allow double for when the system is under load */
int lenlen, len ;
lenlen = sizeof(len) ;
len = 2 * 32000;
msg_socket = socket( AF_UNIX,SOCK_DGRAM, 0);
result = setsockopt(msg_socket, SOL_SOCKET, SO_SNDBUF, (char *)&len, lenlen) ;
result = sendto( msg_socket,
(char *)message,
(int)message_len,
flags,
dest_addr,
addrlen);
Note. We have ported this application to Linux and the problem does not seem to appear there.
Any help would be greatly appreciated.
Regards
UDP send buffer size is different from TCP - it just limits the size of the datagram. Quoting Stevens UNP Vol. 1:
...
A UDP socket has a send buffer size (which we can change with SO_SNDBUF socket option, Section 7.5), but this is simply an upper limit on the maximum-sized UDP datagram that can be written to the socket. If an application writes a datagram larger than the socket send buffer size, EMSGSIZE is returned. Since UDP is unreliable, it does not need to keep a copy of the application's data and does not need an actual send buffer. (The application data is normally copied into a kernel buffer of some form as it passes down the protocol stack, but this copy is discarded by the datalink layer after the data is transmitted.)
UDP simply prepends 8-byte header and passes the datagram to IP. IPv4 or IPv6 prepends its header, determines the outgoing interface by performing the routing function, and then either adds the datagram to the datalink output queue (if it fits within the MTU) or fragments the datagram and adds each fragment to the datalink output queue. If a UDO application sends large datagrams (say 2,000-byte datagrams), there's a much higher probability of fragmentation than with TCP. because TCP breaks the application data into MSS-sized chunks, something that has no counterpart in UDP.
The successful return from write to a UDP socket tells us that either the datagram or all fragments of the datagram have been added to the datalink output queue. If there is no room on the queue for the datagram or one of its fragments, ENOBUFS is often returned to the application.
Unfortunately, some implementations do not return this error, giving the application no indication that the datagram was discarded without even being transmitted.
The last footnote needs attention - but it looks like Tru64 has this error code listed in the manual page.
The proper way of doing it though is to queue your outstanding messages in the application itself and to carefully check return values and the errno after each system call. This still does not guarantee delivery (since UDP receivers might drop the packets without any notice to the senders). Check the UDP packet discard counters with netstat -s on both/all sides, see if they are growing. There is really no way around this besides switching to TCP or implementing your own timeout/ack and re-transmission logic.
You should probably be using some sort of congestion control to avoid overloading the network. By far the easiest way to do this is to use TCP instead of UDP.
It fails less often on Linux because UDP sockets wait for space in the local network interface queue on Linux (unless you set them non-blocking). However, with any operating system, if the overfull queue is not in the local system, the packet will be dropped silently.