How to handle all possible errors on async socket?

How to handle all possible errors on async socket? - linux

I have decided to use async io for my project and simply do a single threaded loop where I try to read some data each frame from each open socket. This worked quite well and overall I'm happy with it for now. The problem is weird problems I'm having with the async sockets.
I have code like this:
accept a connection...
fcntl(O_NONBLOCK) on the client socket...
int rc;
if((rc = recv(socket))>0)
process data
if rc == 0
close socket and cleanup
The problem is that I get rc == 0 sometimes even though I know that the connection is not closed. If I don't clean up then my app works as normal. But if I do cleanup then the client receives a disconnect before the connection is even established.
So my question is: Do I have to check somehow whether the socket is ready before doing a recv in order to get the correct return value from it?
Most of the information I have been able to find was inconclusive. I found a references to select() but it seems to block until there is a status change on the socket - but I need the socket to be nonblocking.
What I'm looking for is just the intuitive behavior that if there is data, it is read to the buffer and recv returns number of bytes read, if there is no data it returns -1 and if the socket is disconnected then it should return 0.
Do I have to do anything else to the socket before calling recv to make it work as expected?

First, taking on the heavy lifting of going "all asynchronous" with a socket server is a good start for a design and will enable scalability very easily.
As for your question.
recv() will return the following values:
A postive value returned by recv() indicates the number of bytes
copied to your buffer.(i.e you actually received these bytes)
recv() will return 0 when the socket was closed by the remote side.
For async sockets, recv() will return -1 and set errno to either
EAGAIN or EWOULDBLOCK if the connection is still valid, but there's
no new data to be consumed. Call select() or poll() on the socket to
wait for data.
Otherwise, any general connection failure will result in -1 being returned by recv(). (And the only thing you can do is close the socket).
So when you say, "rc == 0 sometimes even though I know that the connection is not closed", I suspect your pseudocode is not checking the return value, but instead checking the result of (rc > 0).
This is closer to the logic you want:
int rc;
rc = recv(s, buf, buffersize, 0);
if (rc == 0)
{
/* socket closed by remote end */
close(s); s=-1;
}
else if ((rc == -1) && ((errno == EAGAIN) || (errno == EWOULDBLOCK)) )
{
// need to wait. Call select() or poll()
}
else if (rc == -1)
{
close(s); s=-1;
}
else
{
ProcessNewData(s, buffer, rc);
}

Related

Get system error 115(operation in progress) on blocked socket connect()

In general, system error 115 occurs on a non blocking socket connect(), and the connection needs to be further checked through the select() interface.
But I didn't use fcntl() to set socket to O_NONBLOCK， just set a send timeout as shown below:
setsockopt(fd, SOL_SOCKET, SO_SNDTIMEO, &timeout, len);
So, In my scenario, is the cause of the 115 error in connect() the same as that of a non block socket? And how do I know how long the connect interface is blocked when I get a 115 error?
Code example：
socklen_t len = sizeof(timeout);
ret = setsockopt(real_fd, SOL_SOCKET, SO_SNDTIMEO, &timeout, len);
if (-1 == ret)
{
LOG(" error setsockopt");
return SOCKET_FAIL;
}
ret = connect(fd, (struct sockaddr *) &dest, sizeof(dest));
if (0 != ret)
{
LOG("error connect = %s", strerror(errno));
return ret;
}
Result:
Sometimes get system error "Operation now in progress".
The error number is 115, which is also defined as EINPROGRESS.

Thank you for your ideas. I found the answer in the man socket(7). EINPROGRESS errors can also occur in blocked sockets.
SO_RCVTIMEO and SO_SNDTIMEO
Specify the receiving or sending timeouts until reporting an
error. The argument is a struct timeval. If an input or output
function blocks for this period of time, and data has been sent
or received, the return value of that function will be the
amount of data transferred; if no data has been transferred and
the timeout has been reached, then -1 is returned with errno set
to EAGAIN or EWOULDBLOCK, or EINPROGRESS (for connect(2)) just
as if the socket was specified to be nonblocking. If the time‐
out is set to zero (the default), then the operation will never
timeout. Timeouts only have effect for system calls that per‐
form socket I/O (e.g., read(2), recvmsg(2), send(2),
sendmsg(2)); timeouts have no effect for select(2), poll(2),
epoll_wait(2), and so on.

non-blocking socket vs. select() driven approach

I'm writing user-space application which among other functionality uses netlink sockets to talk to the kernel. I use simple API provided by open source library libmnl.
My application sets certain options over netlink as well as it subscribes to netlink events (notifications), parses it etc. So this second feature (event notifications) is asynchronous, currently I implemented a simple select() based loop:
...
fd_set rfd;
struct timeval tv;
int ret;
while (1) {
tv.tv_sec = 1;
tv.tv_usec = 0;
FD_ZERO(&rfd);
/* fd - is a netlink socket */
FD_SET(fd, &rfd);
ret = select(fd + 1, &rfd, NULL, NULL, &tv);
if (ret < 0) {
perror("select()");
continue;
} else if (ret == 0) {
printf("Timeout on fd %d", fd);
} else if (FD_ISSET(fd, &rfd)) {
/*
count = recv(fd, buf ...)
while (count > 0) {
parse 'buf' for netlink message, validate etc.
count = recv(fd, buf)
}
*/
}
}
So I'm observing now that code inside else if (FD_ISSET(fd, &rfd)) { branch blocks at the second recv() call.
Now I'm trying to understand if I need to set the netlink socket to non-blocking (SOCK_NOBLOCK for example), but then I probably don't need select() at all, I simply can have recv() -> message parse -> recv() loop and it won't block.

... if I need to set the netlink socket to non-blocking ..., but then I probably don't need select() at all ...
Exactly this is the purpose of a non-blocking socket: Instead of doing the if(FD_ISSET(...)) you call recv() and evaluate the return value.
If you use blocking sockets, you must not call recv() more than once after calling select(); then the program is "effectively" non-blocking.
HOWEVER,
... as user "kaylum" already suggested in his comment, you'll have another problem in any case:
It is not guaranteed that one complete "message" is available at the same time. The other end of the socket might send the first part of the message, wait some seconds and then send the second part of the message.
However, select() will tell you that there is at least one byte available; it will not tell you if the complete message is available.
If you want to wait for the complete message in the inner loop (while(count > 0)), you will always have to wait (which means that your program has "effectively" a blocking behavior even if the socket is non-blocking).
If you simply want to process all bytes already available in the inner loop, then the condition count > 0 is wrong. Instead, you should do something like this if you are working with blocking sockets:
else if(FD_ISSET(...))
{
while(FD_ISSET(...))
{
count = recv(...);
if(count > 0)
{
...
select(...);
}
else FD_ZERO(...);
}
}
However, in most cases this will not be necessary and you can simply process the "remaining" data bytes in the next "outer" loop.

Server crashes after closesocket

I have multithreading application, it's periodically polling a few hundred devices.
Each thread serves one device, its socket and other descriptors are encapsulated at individual object, so no shared descriptors.
Occasionally application crashes after closesocket(fSock), when I try set descriptor fSock to 0.
I assume, I should not set fSock = 0, if closesocket(fSock) returns SOCKET_ERROR.
Or is there any other reason?
My code:
bool _EthDev::Connect()
{
int sockErr, ret, i, j;
int szOut = sizeof(sockaddr_in);
// create socket
if ((fSock = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
{
sockErr = GetLastError();
Log("Invalid socket err %d", sockErr);
fSock = 0;
return false;
}
// set fast closing socket (by RST)
linger sLinger;
sLinger.l_onoff = 1;
sLinger.l_linger = 0;
if (sockErr = setsockopt(fSock, SOL_SOCKET, SO_LINGER, (const char FAR*)&sLinger, sizeof(linger)))
{
sockErr = WSAGetLastError();
Log("Setsockopt err %d", sockErr);
closesocket(fSock);
fSock = 0; // here crashes
return false;
}
// connect to device
fSockaddr.sin_port = htons((u_short)(baseport));
if (connect(fSock, (struct sockaddr*)&fSockaddr, szOut))
{
closesocket(fSock);
fSock = 0;
return false;
}
...
return true;
}

I have multithreading application, ... [it] occasionally crashes
A multithreading application that occasionally crashes is a classic symptom of a race condition. I think to prevent the crashes you need to figure out what the race condition is in your code, and fix that.
I assume, I should not set fSock = 0, if closesocket(fSock) returns
SOCKET_ERROR. Or is there any other reason?
I doubt the problem is actually related to closesocket() or to setting fSock to 0. Keep in mind that sockets are really just integers, and setting an integer to 0 isn't likely to cause a crash on its own. What could cause a crash is a write to invalid memory -- and fSock = 0 does write to the memory location where the member variable fSock is (or was) located at.
Therefore, a more likely hypothesis is that the _EthDev object got deleted by thread B while thread A was still in the middle of calling Connect() on it. This would be most likely happen while the connect() call was executing, because a blocking connect() call can take a relatively long time to return. So if there was another thread out there that rudely deleted the _EthDev object during the connect() call, then as soon as connect() returned, the next line of code that would write to the location where the (now deleted) _EthDev object used to be would be the "fSock = 0;" line, and that could cause a crash.
I suggest you review your code that deletes _EthDev objects, and if it isn't careful to first shut down any thread(s) using those objects (and also to wait for the threads to exit!) before deleting the _EthDev objects, you should rewrite it so that it does so reliably. Deleting an object while another thread might still be using it is asking for trouble.

React on client disconnect in linux

I am writing a simple socket daemon witch listens to a port and reads the incoming data. It works fine until i choose to disconnect a client from the server...then it enters in a infinte loop recv() returns the last packet never gets to -1. My question is how can i detect that the client had been disconnected and close the thread/ socket el
My thread is as follows :
void * SocketHandler(void* lp){
int * csock = (int*)lp;
int test = 0;
char buffer[1024];
int buffer_len = 1024;
int bytecount,ierr;
memset(buffer,0,buffer_len);
while (test == 0)
{
if ((bytecount = recv(*csock, buffer, buffer_len, 0))== -1){
close(csock);
free(csock);
test++;
return 0;
}
else
{
syslog(LOG_NOTICE,"%s",buffer);
}
}
return 0;
};

A cleanly closed socket will end up in a ZERO read, while a broken connection is an error state returning -1. You need to catch the 0 return of your recv.

What happens here is that your end may not detect the fact the socket is dead (especially, if you are just reading from it).
What you can do is set keepalive on the socket. This will turn on periodic checks of the socket liveness. But don't expect fast reactions, the default timeout is something like 20 minutes.
i = 1;
setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE, (char *)&i, sizeof(i));
Another option is to do your own keep-alive communication.

recv() will indicate a proper shutdown of the socket by returning 0 (see the manpage for details). It will return -1 if and only if an error occurred. You should check errno for the exact error, since it may or may not indicate that the connection failed (EINTR, EAGAIN or EWOULDBLOCK [non-blocking sockets assumed] would both be recoverable errors).
Side note: there's no need to pass the fd of the socket as pointer and since you're returning a void * you may want to change return 0 to return NULL (just for readability).

Can read() function on a connected socket return zero bytes?

I know that read() is a blocking call unless I make the socket non-blocking. So I expect read() call which requests 4K of data should return a positive value ( no of bytes read) or -1 on error ( possible connection reset by client etc). My question is: Can read() return '0' on any occasion?
I am handling the read() this way:
if ((readval = read(acceptfd, buf, sizeof(buf) - 1)) < 0)
{
}
else
{
buf[readval] = 0;
//Do some thing with data
}
This code bombs if read() return zero and I know how to fix it. But is it possible for read() to return zero?

When a TCP connection is closed on one side read() on the other side returns 0 byte.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to handle all possible errors on async socket? - linux

Related

Get system error 115(operation in progress) on blocked socket connect()

non-blocking socket vs. select() driven approach

Server crashes after closesocket

React on client disconnect in linux

Can read() function on a connected socket return zero bytes?

Categories

Resources