Linux TCP socket in blocking mode

Linux TCP socket in blocking mode - linux

When I create a TCP socket in blocking mode and use the send (or sendto) functions, when the will the function call return?
Will it have to wait till the other side of the socket has received the data? In that case, if there is traffic jam on the internet, could it block for a long time?

Both the sender and the receiver (and possibly intermediaries) will buffer the data.
Sending data successfully is no guarantee that the receiving end has received it.
Normally writes to a blocking socket, won't block as long as there is space in the sending-side buffer.
Once the sender's buffer is full, then the write WILL block, until there is space for the entire write in it.
If the write is partially successful (the receiver closed the socket, shut it down or an error occurred), then the write might return fewer bytes than you had intended. A subsequent write should give an error or return 0 - such conditions are irreversible on TCP sockets.
Note that if a subsequent send() or write() gives an error, then some previously written data could be lost forever. I don't think there is a real way of knowing how much data actually arrived (or was acknowledged, anyway).

Related

Using thread to write and select to read

Has any one tried to create a socket in non blocking mode and use a dedicated thread to write to the socket, but use the select system call to identify if data is available to read data.
if the socket is non blocking, the write call will return immediately and the application will not know the status of the write (if it passed or failed).
is there a way of knowing the status of the write call without having to block on it.

Has any one tried to create a socket in non blocking mode and use a dedicated thread to write to the socket, but use the select system call to identify if data is available to read data.
Yes, and it works fine. Sockets are bi-directional. They have separate buffers for reading and writing. It is perfectly acceptable to have one thread writing data to a socket while another thread is reading data from the same socket at the same time. Both threads can use select() at the same time.
if the socket is non blocking, the write call will
return immediately and the application will not
know the status of the write (if it passed or failed).
The same is true for blocking sockets, too. Outbound data is buffered in the kernel and transmitted in the background. The difference between the two types is that if the write buffer is full (such as if the peer is not reading and acking data fast enough), a non-blocking socket will fail to accept more data and report an error code (WSAEWOULDBLOCK on Windows, EAGAIN or EWOULDBLOCK on other platforms), whereas a blocking socket will wait for buffer space to clear up and then write the pending data into the buffer. Same thing with reading. If the inbound kernel buffer is empty, a non-blocking socket will fail with the same error code, whereas a blocking socket will wait for the buffer to receive data.
select() can be used with both blocking and non-blocking sockets. It is just more commonly used with non-blocking sockets than blocking sockets.
is there a way of knowing the status of the write
call without having to block on it.
On non-Windows platforms, about all you can do is use select() or equivalent to detect when the socket can accept new data before writing to it. On Windows, there are ways to receive a notification when a pending read/write operation completes if it does not finish right away.
But either way, outbound data is written into a kernel buffer and not transmitted right away. Writing functions, whether called on blocking or non-blocking sockets, merely report the status of writing data into that buffer, not the status of transmitting the data to the peer. The only way to know the status of the transmission is to have the peer explicitly send back a reply message once it has received the data. Some protocols do that, and others do not.

is there a way of knowing the status of the write call without having
to block on it.
If the result of the write call is -1, then check errno to for EAGAIN or EWOULDBLOCK. If it's one of those errors, then it's benign and you can go back to waiting on a select call. Sample code below.
int result = write(sock, buffer, size);
if ((result == -1) && ((errno == EAGAIN) || (errno==EWOULDBLOCK)) )
{
// write failed because socket isn't ready to handle more data. Try again later (or wait for select)
}
else if (result == -1)
{
// fatal socket error
}
else
{
// result == number of bytes sent.
// TCP - May be less than the number of bytes passed in to write/send call.
// UDP - number of bytes sent (should be the entire thing)
}

Will select() block if called while there is still data to be read?

If a socket has data to be read and the select() function is called, will select():
Return immediately, indicating the socket is ready for reading, or
Block until more data is received on the socket
??

It can easily be tested, but I assure you select() will never block if there is data already available to read on one of the readfds. If it did block in that case, it wouldn't be very useful for programming with non-blocking I/O. Take the example where you are looping on select(), you see that there is data to be read, and you read it. Then while you are processing the data read, more data comes in. When you return to select() it blocks, waiting for more data. However your peer on the other side of the connection is waiting for a response to the data already sent. Your program ends up blocking forever. You could work around it with timeouts and such, but the whole point is to make non-blocking I/O efficient.
If an fd is at EOF, select() will never block even if called multiple times.

man 2 select seems to answer this question pretty directly:
select() and pselect() allow a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g., input possible). A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read(2)) without blocking.
So at least according to the manual, it would return immediately if there is any data available.

Handling short reads using epoll()

Let's say client sent 100 bytes of data but somehow server only received 90 bytes. How do I handle this case? If server calls the "read" function inside of while loop checking the total received data then the server will wait forever for the pack last 10 bytes..
Also, it could happen that client got disconnected in the middle of data transfer. In this case also server will wait forever until it receives all the data which won't arrive..
I am using tcp but in real world network environment, this situation could happen. Thanks in advance...

You do not call the read() function in a loop until you receieve the number of bytes you require. Instead, you set the socket to nonblocking and call the read() function in a loop until it returns 0 (indicating end of stream) or an error.
In the normal case the loop will terminate by read() returning -1, with errno set to EAGAIN. This indicates that the connection hasn't been closed, but no more data is available at the current time. At this point, if you do not have enough data from the client yet, you simply save the data that you do have for later, and return to the main epoll() loop.
If and when the remainder of the data arrives, the socket will be returned as readable by epoll(), you will read() the rest of the data, retreieve the saved data and process it all.
This means that you need space in your per-socket data structure to store the read-but-not-processed-yet data.

You must carefully check the return value of read. It can return any of three things:
A positive number, indicating some bytes were read.
Zero, indicating the other end has gracefully closed the connection.
-1, meaning an error occurred. (If the socket is non-blocking, then the error EAGAIN or EWOULDBLOCK means the connection is still open but no data is ready for you right now, so you need to wait until epoll says there is more data for you.)
If your code is not checking for each of these three things and handling them differently, then it is almost certainly broken.
These cover all of the cases you are asking about, like a client sending 90 bytes then closing or rudely breaking the connection (because read() will return 0 or -1 for those cases).
If you are worried that a client might send 90 bytes and then never send any more, and never close the connection, then you have to implement your own timeouts. For that your best bet is non-blocking sockets and putting a timeout on select() / poll() / epoll(), ditching the connection if it is idle for too long.

TCP connection is a bi-directional stream layered on top of packet-based network. It's a common occurrence to read only part of what the other side sent. You have to read in a loop, appending until you have a complete message. For that you need an application level protocol - types, structure, and semantics of messages - that you use on top of TCP (FTP, HTTP, SMTP, etc. are such protocols).
To answer the specific second part of the question - add EPOLLRDHUP to the set of epoll(7) events to get notified when connection drops.

In addition to what caf has said, I'd recommend just subscribing EPOLLRDHUP because this is the only safe way to figure out whether a connection was closed (read() == 0 is not reliable as, caf mentioned this too, may be true in case of an error). EPOLLERR is always subscribed to, even if you didn't specifically asked for it. The correct behaviour is to close the connection using close() in case of EPOLLRDHUP and probably even when EPOLLERR is set.
For more information, I've given a similar answer here: epoll_wait() receives socket closed twice (read()/recv() returns 0)

How to handle OpenSSL SSL_ERROR_WANT_READ / WANT_WRITE on non-blocking sockets

The OpenSSL library allows to read from an underlying socket with SSL_read and write to it with SSL_write. These functions maybe return with SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE depending on their ssl protocol needs (for example when renegotiating a connection).
I don't really understand what the API wants me to do with these results.
Imaging a server app that accepts client connections, sets up a new ssl session, makes the underlying socket non-blocking and then adds the filedescriptor to a select/poll/epoll loop.
If a client sends data, the main loop will dispatch this to a ssl_read. What has to be done here if SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE is returned? WANT_READ might be easy, because the next main loop iteration could just lead to another ssl_read. But if the ssl_read return WANT_WRITE, with what parameters should it be called? And why doesn't the library issue the call itself?
If the server wants to send a client some data, it will use ssl_write. Again, what is to be done if WANT_READ or WANT_WRITE are returned? Can the WANT_WRITE be answered by repeating the very same call that just was invoked? And if WANT_READ is returned, should one return to the main loop and let the select/poll/epoll take care of this? But what about the message that should be written in the first place?
Or should the read be done right after the failed write? Then, what protects against reading bytes from the application protocol and then having to deal with it somewhere in the outskirts of the app, when the real parser sits in the mainloop?

With non-blocking sockets, SSL_WANT_READ means "wait for the socket to be readable, then call this function again."; conversely, SSL_WANT_WRITE means "wait for the socket to be writeable, then call this function again.". You can get either SSL_WANT_WRITE or SSL_WANT_READ from both an SSL_read() or SSL_write() call.

Did you read the OpenSSL documentation for SSL_read() and SSL_get_error() yet?
SSL_read():
If the underlying BIO is blocking,
SSL_read() will only return, once the
read operation has been finished or an
error occurred, except when a
renegotiation take place, in which
case a SSL_ERROR_WANT_READ may occur.
This behaviour can be controlled with
the SSL_MODE_AUTO_RETRY flag of the
SSL_CTX_set_mode(3) call.
If the underlying BIO is non-blocking,
SSL_read() will also return when the
underlying BIO could not satisfy the
needs of SSL_read() to continue the
operation. In this case a call to
SSL_get_error(3) with the return value
of SSL_read() will yield
SSL_ERROR_WANT_READ or
SSL_ERROR_WANT_WRITE. As at any time a
re-negotiation is possible, a call to
SSL_read() can also cause write
operations! The calling process then
must repeat the call after taking
appropriate action to satisfy the
needs of SSL_read(). The action
depends on the underlying BIO. When
using a non-blocking socket, nothing
is to be done, but select() can be
used to check for the required
condition.
SSL_get_error():
SSL_ERROR_WANT_READ, SSL_ERROR_WANT_WRITE
The operation did not complete; the
same TLS/SSL I/O function should be
called again later. If, by then, the
underlying BIO has data available for
reading (if the result code is
SSL_ERROR_WANT_READ) or allows writing
data (SSL_ERROR_WANT_WRITE), then some
TLS/SSL protocol progress will take
place, i.e. at least part of an TLS/SSL
record will be read or written. Note
that the retry may again lead to a
SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE
condition. There is no fixed upper limit
for the number of iterations that may
be necessary until progress becomes
visible at application protocol level.
For socket BIOs (e.g. when SSL_set_fd()
was used), select() or poll() on the
underlying socket can be used to find
out when the TLS/SSL I/O function
should be retried.
Caveat: Any TLS/SSL I/O function can
lead to either of SSL_ERROR_WANT_READ
and SSL_ERROR_WANT_WRITE. In particular,
SSL_read() or SSL_peek() may want to
write data and SSL_write() may want to
read data. This is mainly because
TLS/SSL handshakes may occur at any
time during the protocol (initiated by
either the client or the server);
SSL_read(), SSL_peek(), and SSL_write()
will handle any pending handshakes.
OpenSSL is implemented as a state machine. SSL_ERROR_WANT_READ means that more inbound data, and SSL_ERROR_WANT_WRITE means that more outbound data, is needed in order to make forward progress on the connection.
If you get SSL_ERROR_WANT_WRITE, OpenSSL needs to send outbound data but can't because the socket is no longer writable (the peer's receive buffer can't hold any more data), so you need to wait for the socket to become writable (the peer has freed up buffer space) and then retry the operation again.
If you get SSL_ERROR_WANT_READ, OpenSSL needs to read inbound data but can't because the socket is no longer readable (the socket's receive buffer is empty), so you need to wait for the socket to become readable (more data has arrived) and then retry the operation again.
You should subscribe to the OpenSSL mailing lists. This question gets asked alot.

SSL_WANT_READ means that the SSL engine can't currently encrypt for you as it's waiting for more input data (either as part of the initial handshake or as part of a renegotiation), so, once your next read has completed and you've pushed the data that arrived through the SSL engine you can retry your write operation.
Likewise, SSL_WANT_WRITE means that the SSL engine is waiting for you to extract some data from it and send it to the peer.
I wrote about using OpenSSL with non blocking and async sockets back in 2002 for Windows Developer Journal (reprinted here) and although this article is ostensibly aimed at Windows code the principals are the same for other platforms. The article comes with some code that integrates OpenSSL with async sockets on Windows and which deals with the whole SSL_WANT_READ/SSL_WANT_WRITE issue.
Essentially, when you get an SSL_WANT_READ you need to queue outbound data until you've had a read complete and you've passed the new inbound data into the SSL engine, once that has happened you can retry sending your outbound data.

Linux TCP/IP Non-blocking send for socket stream..what happens to the TCP recv buffer?

This pertains to Linux kernel 2.6 TCP sockets.
I am sending a large amount of data, say 300 MB, with a non-blocking send to another client who receives 8 MB at a time.
After one 8 MB receive, the "receiver" stops receiving because it wants to perform other tasks, such as error handling. The sender would get a EWOULDBLOCK, but since it's asynchronous communication, the send would try and fill up the TCP recv buffer on the other end.
My question is: would there still be data in the TCP recv buffer even though the "sender" got a EWOULDBLOCK and the "receiver" stops receiving? The same socket is used for error handling, so would the "receiver" have to then clear the TCP recv buffer before trying to reuse the existing socket?

Yes. It is quite possible (and in fact likely) that when you get EWOULDBLOCK, some data you have already sent has not yet been read by the recieving application. This buffered data will be available to the next read on the socket.
This means that if your reciever then sends a "Ooops, don't send any more" message back to the sender, the sender can't act on that message and "un-send" the data. Once it's been passed to write()/send(), it's on its way and can't be recalled.
Your receiver will have to handle this eventuality by reading out the data it's no longer interested in and discarding it, which will mean you'll need some kind of transaction delimiters in your data stream.

My question is: would there still be data in the TCP recv buffer even though the "sender" got a EWOULDBLOCK and the "receiver" stops receiving?
There is data in the TCP receive buffer because the sender got EWOULDBLOCK. That's the only condition it can happen under.
Your question doesn't make sense.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string