can the send function block - linux

I'm writing a chat program and for the server, when I send data can the send() function take a long time to send out the data?
Here is my problem:
I'm using linux 2.6 with epoll, server in single thread
If send() blocks, then this means all other activity on the server will stop. Like if there is a very slow client that does not send ACK responses for a long time to a tcp packet, will the send function just move on right away, or will it wait a long time for the client. The thing I don't want is for a single/few slow clients to cause delays in the chat server.
What I want is for send() to nonblock and to return very quickly. If it doesn't send all the data, it will simply return the amount sent and I will remove that from the buffer and keep sending next time serviced until all data sent. Basically I don't want send to block for a long time on a slow or unresponsive client.

You can set a socket to non-blocking mode and a send will not block. The problem is that you'll have to manage the fact that a partial write occurred and send the rest of the data when the write file descriptor becomes active again.
In general I've found that doing both send and recv in non-blocking mode, while complicating the program, works pretty well.
Use something like:
if (-1 == (flags = fcntl(fd, F_GETFL, 0)))
flags = 0;
return fcntl(fd, F_SETFL, flags | O_NONBLOCK);

Related

How to tell another thread that one thread is within recv() call *now*

There is an embedded Linux system with an externally connected (ethernet) device which sends lots of UDP data, once it's started by a command.
At startup, I have one thread which is to continuously receive UDP data for the remainder of the program runtime.
As I see it, for reliability in principle, not just by accident, it must be ensured that the UDP reception loop must make its first call to recv() before the external data source is started, or the first packet or so might me lost, depending on scheduler whims. (this is all in a very local, purposefully simple network setup - packet loss is normally not an issue and not handled - speed is king)
Currently, right before calling that UDP reception code, I start a temporary thread which delays for some time, then enables the data source to send UDP data.
This is currently the way to "ensure" that the first UDP packet arrives when the reception thread is "armed", i.e. within a recv() call, the OS waiting for data.
Even if I, say, set a condition variable, right before the first call to recv() to tell the rest of the program "ok, you can enable the data source now, I'm ready" - it could, in theory, happen that there is some scheduling induced delay between that signal flagging and the actual call to recv (or/and for the internals of recv actually being ready).
Is there a more elegant / proper way to solve this, than using some "empirical delay time"?
Pseudo code for illustration:
// ******** main thread ********
thread delayed( [&]{ sleepMs(500); enableUdpDataSource(); } );
thread udpRecv( [&]{ udpRecvUntilTimeout() } );
delayed.join();
udpRecv.join();
return 0;
// ******** UDP thread ********
void udpRecvUntilTimeout()
{
udpInit(); // set up socket, buffer sizes etc
while (shouldRun)
{
// recv() needs to be "armed" *before* the data source is enabled.
// If I set a condition variable for another thread right here,
// there may be a scheduling intervention between it and the actual
// engaging of recv() - when the other thread happily enables the datasource.
int received = recv( sockFd, buf, maxlen, 0 );
timeoutWatchdogReset();
processReceivedData();
}
}
In an earlier version, I suggested that the call to bind is optional, but of course it is not. You have to call it in order to tell the kernel which UDP port to open.
After bind, the kernel will buffer incoming UDP packets and you can call recv if you're not interested in the client network details (otherwise, call recvfrom).
Something along these lines:
char buf[1500];
struct sockaddr_in addr;
int sd = socket(AF_INET, SOCK_DGRAM, 0);
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
addr.sin_port = htons((unsigned short) 1234); // UDP port
bind(sd, (struct sockaddr *)&addr, sizeof(addr));
// start data sending thread
sleep(1); // for testing
recv(sd, buf, 100, 0);
But there are no guarantees with UDP; you might still lose packets (eg. if the sender is overloading the receiver)
As you're using Linux, it might be possible to use FTRACE to determine whereabouts your receiving thread has got. The function tracing in this allows one (normally in post mortem debugging / analysis) to see the function calls made by a process. I'm pretty sure that this is exposed through some /sys or /proc file system, so it ought to be possible to monitor it live instead.
So if you had your temporary thread looking at the system calls of the receiving thread, it would be able to spot it entering the call to recv().
If FTRACE is not already built into your kernel, you'll need to recompile your kernel to include it. Could be handy - FTRACE + kernelshark is a nice way of debugging your application anyway.

Is read guaranteed to return as soon as data is available?

I want to implement a simple notification protocol using TCP sockets. The server will write a byte to a socket to notify the client, and the client reads from the socket, waiting until some data arrives, at which point it can return from the read call and perform some work.
while (1) {
/* Wait for any notifications */
char buf[32];
if (read(fd, buf, sizeof(buf)) <= 0) {
break;
}
/* Received notification */
do_work();
}
My question is, is read guaranteed to return as soon as any data is available to read, or is the kernel allowed to keep waiting until some condition is met (e.g. some minimum number of bytes received, not necessarily the count which I pass into read) before it returns from the read call? If the latter is true, is there a flag that will disable that behavior?
I am aware that I could use the O_NONBLOCK flag and call read in a loop, but the purpose of this is to use as little CPU time as possible.
There are multiple implicit questions here:
Is read guaranteed to return immediately or shortly after a relevant event?
No. The kernel is technically allowed to make you wait as long as it wants.
In practice, it'll return immediately (modulo rescheduling delays).
This is true for poll and O_NONBLOCK as well. Linux is not a realtime OS, and offers no hard timing guarantees, just its best effort.
Is read allowed to wait indefinitely for multiple bytes to become available?
No, this would cause deadlocks. read needs to be able to return with a single byte, even if there are no guarantees about when it will do so.
In practice, Linux makes the same effort for 1 byte as it does for 1,048,576.
Is sending a single byte on a socket a practical way of waking up a remote process as soon as possible?
Yes, your example is perfectly fine.

Using thread to write and select to read

Has any one tried to create a socket in non blocking mode and use a dedicated thread to write to the socket, but use the select system call to identify if data is available to read data.
if the socket is non blocking, the write call will return immediately and the application will not know the status of the write (if it passed or failed).
is there a way of knowing the status of the write call without having to block on it.
Has any one tried to create a socket in non blocking mode and use a dedicated thread to write to the socket, but use the select system call to identify if data is available to read data.
Yes, and it works fine. Sockets are bi-directional. They have separate buffers for reading and writing. It is perfectly acceptable to have one thread writing data to a socket while another thread is reading data from the same socket at the same time. Both threads can use select() at the same time.
if the socket is non blocking, the write call will
return immediately and the application will not
know the status of the write (if it passed or failed).
The same is true for blocking sockets, too. Outbound data is buffered in the kernel and transmitted in the background. The difference between the two types is that if the write buffer is full (such as if the peer is not reading and acking data fast enough), a non-blocking socket will fail to accept more data and report an error code (WSAEWOULDBLOCK on Windows, EAGAIN or EWOULDBLOCK on other platforms), whereas a blocking socket will wait for buffer space to clear up and then write the pending data into the buffer. Same thing with reading. If the inbound kernel buffer is empty, a non-blocking socket will fail with the same error code, whereas a blocking socket will wait for the buffer to receive data.
select() can be used with both blocking and non-blocking sockets. It is just more commonly used with non-blocking sockets than blocking sockets.
is there a way of knowing the status of the write
call without having to block on it.
On non-Windows platforms, about all you can do is use select() or equivalent to detect when the socket can accept new data before writing to it. On Windows, there are ways to receive a notification when a pending read/write operation completes if it does not finish right away.
But either way, outbound data is written into a kernel buffer and not transmitted right away. Writing functions, whether called on blocking or non-blocking sockets, merely report the status of writing data into that buffer, not the status of transmitting the data to the peer. The only way to know the status of the transmission is to have the peer explicitly send back a reply message once it has received the data. Some protocols do that, and others do not.
is there a way of knowing the status of the write call without having
to block on it.
If the result of the write call is -1, then check errno to for EAGAIN or EWOULDBLOCK. If it's one of those errors, then it's benign and you can go back to waiting on a select call. Sample code below.
int result = write(sock, buffer, size);
if ((result == -1) && ((errno == EAGAIN) || (errno==EWOULDBLOCK)) )
{
// write failed because socket isn't ready to handle more data. Try again later (or wait for select)
}
else if (result == -1)
{
// fatal socket error
}
else
{
// result == number of bytes sent.
// TCP - May be less than the number of bytes passed in to write/send call.
// UDP - number of bytes sent (should be the entire thing)
}

Linux TCP socket in blocking mode

When I create a TCP socket in blocking mode and use the send (or sendto) functions, when the will the function call return?
Will it have to wait till the other side of the socket has received the data? In that case, if there is traffic jam on the internet, could it block for a long time?
Both the sender and the receiver (and possibly intermediaries) will buffer the data.
Sending data successfully is no guarantee that the receiving end has received it.
Normally writes to a blocking socket, won't block as long as there is space in the sending-side buffer.
Once the sender's buffer is full, then the write WILL block, until there is space for the entire write in it.
If the write is partially successful (the receiver closed the socket, shut it down or an error occurred), then the write might return fewer bytes than you had intended. A subsequent write should give an error or return 0 - such conditions are irreversible on TCP sockets.
Note that if a subsequent send() or write() gives an error, then some previously written data could be lost forever. I don't think there is a real way of knowing how much data actually arrived (or was acknowledged, anyway).

Handling short reads using epoll()

Let's say client sent 100 bytes of data but somehow server only received 90 bytes. How do I handle this case? If server calls the "read" function inside of while loop checking the total received data then the server will wait forever for the pack last 10 bytes..
Also, it could happen that client got disconnected in the middle of data transfer. In this case also server will wait forever until it receives all the data which won't arrive..
I am using tcp but in real world network environment, this situation could happen. Thanks in advance...
You do not call the read() function in a loop until you receieve the number of bytes you require. Instead, you set the socket to nonblocking and call the read() function in a loop until it returns 0 (indicating end of stream) or an error.
In the normal case the loop will terminate by read() returning -1, with errno set to EAGAIN. This indicates that the connection hasn't been closed, but no more data is available at the current time. At this point, if you do not have enough data from the client yet, you simply save the data that you do have for later, and return to the main epoll() loop.
If and when the remainder of the data arrives, the socket will be returned as readable by epoll(), you will read() the rest of the data, retreieve the saved data and process it all.
This means that you need space in your per-socket data structure to store the read-but-not-processed-yet data.
You must carefully check the return value of read. It can return any of three things:
A positive number, indicating some bytes were read.
Zero, indicating the other end has gracefully closed the connection.
-1, meaning an error occurred. (If the socket is non-blocking, then the error EAGAIN or EWOULDBLOCK means the connection is still open but no data is ready for you right now, so you need to wait until epoll says there is more data for you.)
If your code is not checking for each of these three things and handling them differently, then it is almost certainly broken.
These cover all of the cases you are asking about, like a client sending 90 bytes then closing or rudely breaking the connection (because read() will return 0 or -1 for those cases).
If you are worried that a client might send 90 bytes and then never send any more, and never close the connection, then you have to implement your own timeouts. For that your best bet is non-blocking sockets and putting a timeout on select() / poll() / epoll(), ditching the connection if it is idle for too long.
TCP connection is a bi-directional stream layered on top of packet-based network. It's a common occurrence to read only part of what the other side sent. You have to read in a loop, appending until you have a complete message. For that you need an application level protocol - types, structure, and semantics of messages - that you use on top of TCP (FTP, HTTP, SMTP, etc. are such protocols).
To answer the specific second part of the question - add EPOLLRDHUP to the set of epoll(7) events to get notified when connection drops.
In addition to what caf has said, I'd recommend just subscribing EPOLLRDHUP because this is the only safe way to figure out whether a connection was closed (read() == 0 is not reliable as, caf mentioned this too, may be true in case of an error). EPOLLERR is always subscribed to, even if you didn't specifically asked for it. The correct behaviour is to close the connection using close() in case of EPOLLRDHUP and probably even when EPOLLERR is set.
For more information, I've given a similar answer here: epoll_wait() receives socket closed twice (read()/recv() returns 0)

Resources