Writing to closed TCP/IP connection hangs - linux

My TCP/IP client hangs while writing to socket. This happens even if server properly closes an accepted connection with close() (or with shutdown()) call. I've always thought that write should return with ECONNRESET error for this case.
How do i prevent hang ups in synchronous output? Or rather what am I doing wrong so that an error is not reported by write()?
Should I use send() instead of write() or they are interchangeable?
I'm testing networking in a separate application with two threads.
Main thread starts server thread, that accepts connection and immediately closes it. Main thread then imitates client behavior by connecting to listening port.
Main thread code:
sockaddr_in serv_addr;
bzero((char *) &serv_addr, sizeof(serv_addr));
serv_addr.sin_family = AF_INET;
serv_addr.sin_addr.s_addr = INADDR_ANY;
serv_addr.sin_port = htons(port);
int s = socket(AF_INET, SOCK_STREAM, 0);
if (bind(s, (sockaddr*)(&serv_addr), sizeof(serv_addr)) < 0) {
close(s);
throw runtime_error(strerror(errno));
}
listen(s,1);
start_thread(s); // posix thread accepting a connection on s is started here
//Code below imitates TCP/IP client
struct hostent *hp;
struct sockaddr_in addr;
if((hp = gethostbyname(host.c_str())) == NULL){
throw HErrno();
}
bcopy(hp->h_addr, &addr.sin_addr, hp->h_length);
addr.sin_port = htons(port);
addr.sin_family = AF_INET;
int _socket = ::socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
if (_socket < 0) {
throw Errno();
}
assert(ENOTCONN==107);
assert(_socket>=0);
if(::connect(_socket, (const struct sockaddr *)&addr, sizeof(struct sockaddr_in)) == -1){
throw Errno();
}
while(true) {
const char a[] = "pattern";
if (write( _socket, a, 7)<0) // Writes 30000 times, then hangs
break;
}
Server thread code:
int connection = accept(s);
close(connection);
EDIT:
The problem reduced to my programming error. It seems I've failed to start accepting thread properly.

Each TCP connection has a receive buffer for storing data that are received but not delivered to application. If in your server you don't do read(), than the data is accumulated in receive buffer, and at one point this receive buffer becomes full, causing TCP sends Window=0 message.
Your server thread code should look like this:
char a[10];
int connection = accept(s);
while(true)
if (read( connection , a, 7)<=0)
break;
close(connection);

Did you perhaps set up a handler for SIGPIPE instead of setting it to SIG_IGN?

Related

socket::accept continual to return EGAIN

I use nonblocking socket to receive new connection. But the code repeatedly fails to accept().
int sockfd = ::socket(family, SOCK_STREAM | SOCK_NONBLOCK | SOCK_CLOEXEC, IPPROTO_TCP);
::bind(sockfd, bind_addr, static_cast<socklen_t>(sizeof(struct sockaddr_in6)));
ret = ::listen(sockfd, SOMAXCONN);
while (True) {
::poll(&*pollfds_.begin(), pollfds_.size(), timeoutMs);
struct sockaddr_in6 addr;
bzero(&addr, sizeof addr);
socklen_t addrlen = static_cast<socklen_t>(sizeof *addr);
int connfd = ::accept4(sockfd, sockaddr_cast(addr),
&addrlen, SOCK_NONBLOCK | SOCK_CLOEXEC);
}
errno is EAGAIN.
From the manpage to accept(2):
EAGAIN or EWOULDBLOCK
The socket is marked nonblocking and no connections are present to be accepted. POSIX.1-2001 allows either error to be returned for this case, and does not require these constants to have the same value, so a portable application should check for both possibilities.
This means that the call to accept is made before the client has connected.
Before calling accept, you must call listen and bind.
But as your socket is not blocking, you should wait for client to wait to connect. You can do that with select function:
int sockfd = ::socket(family, SOCK_STREAM | SOCK_NONBLOCK | SOCK_CLOEXEC, IPPROTO_TCP);
// addr is for accept call, sin for bind call
struct sockaddr_in6 addr, sin;
bzero(&addr, sizeof addr);
// prepare sin to tell bind to listen on any connection on given port
sin.sin6_family = family;
sin.sin6_addr = in6addr_any;
sin.sin6_port = htons(port); // choose port on which client could connect
sin.sin6_scope_id = 0;
// bind socket to interface
if (::bind(sock, (struct sockaddr*) &sin, sizeof(sin)) < 0)
{
perror("bind");
}
// listen for new connection
if (::listen(sock, SOMAXCONN) < 0)
{
perror("socket");
}
while (1)
{
fd_set conset;
FD_ZERO(&conset);
FD_SET(sockfd, &conset);
struct timeval timeout = {10, 0};
int maxfd = sockfd;
// wait for new client
select(maxfd + 1, &conset, NULL, NULL, &timeout);
if (FD_ISSET(sockfd, &conset))
{
// a new client is waiting
int connfd = ::accept(sockfd, &addr);
if (connfd < 0)
{
perror("accept");
}
else
{
// do thing with new client
}
}
else
{
printf("no new client in last 10 seconds")
}
}

Understandng the reason for recv blocking forever

I run a Linux program written in C that would periodically receive data by parsing an HTTP response, crunch some numbers and then report the result by HTTP GET of another web page.
My problem is that sometimes, one of the instances would "freeze".
Looking at top I can see that it is in sk_wait_data state and attaching a debugger reveals that it is blocked by a recv call.
Here is a minimal version of the code that does the TCP connection (it was adapted from http://www.linuxhowtos.org/C_C++/socket.htm):
int connectTCP(const char* host, const char* page, int portno) {
int sockfd;
struct sockaddr_in serv_addr;
struct hostent *server;
// Create socket //
sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (sockfd < 0)
error("ERROR opening socket");
// Get ip from hostname //
server = gethostbyname(host);
if (server == NULL)
error("ERROR, can not find host\n");
memset((char *) &serv_addr, 0, sizeof(serv_addr));
serv_addr.sin_family = AF_INET;
memcpy((char *)&serv_addr.sin_addr.s_addr, // Destination
(char *)server->h_addr, // Source
server->h_length); // Size
serv_addr.sin_port = htons(portno);
// Conect to socket //
if (connect(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr)) < 0)
error("ERROR connecting");
return sockfd;
}
char* httpGet(const char* host, const char* page, int portno) {
int sockfd, n;
sockfd = connectTCP(host, page, portno);
memset(buffer, 0, sizeof(buffer));
sprintf(buffer, "GET /%s HTTP/1.0\r\nHost: %s\r\n\r\n", page, host);
n = send(sockfd,buffer,strlen(buffer), 0);
if (n < 0)
error("ERROR writing to socket");
int count = 0;
do {
n = recv(sockfd, buffer + count, BUFFER_MAX_SIZE - count, 0);
if (n < 0) {
error("ERROR reading from socket");
}
count += n;
} while(n != 0);
close(sockfd);
return buffer;
}
Bugs in your code:
If recv() returns zero you whould close the socket and stop reading.
If recv() returns -1 you should report the error, close the socket, and stop reading, unless you had set a read timeout and errno was EAGAIN/EWOULDBLOCK, in which case you should handle the timeout however is appropriate for your application.

why my TCP server code send a SYN/ACK on only first packet or only on the first connection?

SOCKET sock;
SOCKET fd;
uint16 port = 18001;
void CreateSocket()
{
struct sockaddr_in server, client; // creating a socket address structure: structure contains ip address and port number
WORD wVersionRequested;
WSADATA wsaData;
int len;
printf("Initializing Winsock\n");
wVersionRequested = MAKEWORD (2, 2);
iResult = WSAStartup (wVersionRequested, &wsaData);
if (iResult != NO_ERROR)
printf("Error at WSAStartup()\n");
// create socket
sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (sock < 0) {
printf("Could not Create Socket\n");
//return 0;
}
printf("Socket Created\n");
// create socket address of the server
memset( &server, 0, sizeof(server));
// IPv4 - connection
server.sin_family = AF_INET;
// accept connections from any ip adress
server.sin_addr.s_addr = htonl(INADDR_ANY);
// set port
server.sin_port = htons(18001);
//Binding between the socket and ip address
if(bind (sock, (struct sockaddr *) &server, sizeof(server)) < 0)
{
printf("Bind failed with error code: %d", WSAGetLastError());
}
//Listen to incoming connections
if(listen(sock,3) == -1){
printf("Listen failed with error code: %d", WSAGetLastError());
}
printf("Server has been successfully set up - Waiting for incoming connections");
for(;;){
len = sizeof(client);
fd = accept(sock, (struct sockaddr*) &client, &len);
if (fd < 0){
printf("Accept failed");
close(sock);
}
//echo(fd);
printf("\n Process incoming connection from (%s , %d)", inet_ntoa(client.sin_addr),ntohs(client.sin_port));
//closesocket(fd);
}
}
The server code is accepting a connection from the client via the ip address and the port number. It is sending SYN/ACK to the client only during the first connection and It is sending like below for the second time: RST / ACK (it is resetting during the second time).
Could anyone tell me what is the error in the above code ??
Look at Accept multiple subsequent connections to socket
Here is a quote: "To service multiple clients, you need to avoid blocking I/O -- i.e., you can't just read from the socket and block until data comes in."

Linux C Socket: Blocked on recv call

In my application i have created a thread for a simple http server, then from within my application i tried to connect to http server but control is blocked/hanged on recv call.
But if try to connect to my application's http server using linux GET command, I will be connected to http server successfully.
As per my understanding by searching the google i found that this is not the right approach.
But if i want to do this, in what should i create the sockets so that i can connect o my http server from within the application.
Below is how my http server socket created
pthread_create(&pt_server, NULL, http_srvr, NULL);
//http server handler
void *http_server()
{
int sockfd, new_fd;
struct sockaddr_in my_addr;
struct sockaddr_in their_addr;
socklen_t sin_size;
struct sigaction sa;
int yes=1;
if ((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1)
{
perror("socket");
exit(1);
}
if (setsockopt(sockfd,SOL_SOCKET,SO_REUSEADDR,&yes,sizeof(int)) == -1)
{
perror("setsockopt");
exit(1);
}
my_addr.sin_family = AF_INET; // host byte order
my_addr.sin_port = htons(HTTP_PORT); // short, network byte order
my_addr.sin_addr.s_addr = INADDR_ANY; // automatically fill with my IP
memset(&(my_addr.sin_zero), '\0', 8); // zero the rest of the struct
if (bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr))== -1)
{
perror("bind");
exit(1);
}
printf("Listening to sockets\n");
if (listen(sockfd, BACKLOG) == -1)
{
perror("listen");
exit(1);
}
sa.sa_handler = sigchld_handler; // reap all dead processes
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART;
if (sigaction(SIGCHLD, &sa, NULL) == -1)
{
perror("sigaction");
exit(1);
}
printf("server: waiting for connections...\n");
while(1) { // main accept() loop
sin_size = sizeof(struct sockaddr_in);
if ((new_fd = accept(sockfd, (struct sockaddr *)&their_addr,&sin_size)) == -1)
{
perror("accept");
continue;
}
printf("server: got connection from %s\n",inet_ntoa(their_addr.sin_addr));
handle_connection(new_fd);
}
}
And following is how i am doing http POST to my http server
/* create socket */
if ((s = socket(AF_INET, SOCK_STREAM, 0)) < 0)
return ERRSOCK;
setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, 0, 0);
/* connect to server */
if (connect(s, &server, sizeof(server)) < 0)
ret=ERRCONN;
else {
if (pfd) *pfd=s;
/* create header */
if (proxy) {
sprintf(header,
"%s http://%.128s:%d/%.256s HTTP/1.0\015\012User-Agent: %s\015\012%s\015\012",
command,
http_server1,
http_port,
url,
http_user_agent,
additional_header
);
} else {
sprintf(header,
"%s /%.256s HTTP/1.0\015\012User-Agent: %s\015\012%s\015\012",
command,
url,
http_user_agent,
additional_header
);
}
hlg=strlen(header);
/* send header */
if (send(s,header,hlg,0)!=hlg)
ret= ERRWRHD;
/* send data */
else if (length && data && (send(s,data,length,0)!=length) )
ret= ERRWRDT;
else {
/* read result & check */
ret=http_read_line(s,header,MAXBUF-1);
and following are the contents of http_read_line, and in this function recv call blocked
static int http_read_line (fd,buffer,max)
int fd; /* file descriptor to read from */
char *buffer; /* placeholder for data */
int max; /* max number of bytes to read */
{ /* not efficient on long lines (multiple unbuffered 1 char reads) */
int n=0;
while (n<max) {
if (recv(fd,buffer,1,0)!=1) {
n= -n;
break;
}
n++;
if (*buffer=='\015') continue; /* ignore CR */
if (*buffer=='\012') break; /* LF is the separator */
buffer++;
}
*buffer=0;
return n;
}
You need to either send an HTTP 1.0 header, or else read about content-length in HTTP 1.1. You are reading the stream to EOS when the server is under no obligation to close the connection, so you block. The Content-Length header tells you how much data is in the body: you should only try to read that many bytes.
If you specify HTTP 1.0 (and no fancy headers) the server will close the connection after sending the response.
You have told "In my application i have created a thread for a simple http server, then from within my application
i tried to connect to http server but control is blocked/hanged on recv call."
That means the recv is never returning 0. Now when the recv function will
return a 0? ->When it gets a TCP FIN segment. It seems that your server is never
sending a TCP FIN segment to the client.
The reason that is most likely here is that, your client code needs modification.
You are sending data from from the client, but you are never sending the FIN,
so I assume that your server function is continuing forever and it had not
sent the FIN. This made the recv wait for ever.
In the current code perhaps the fix is to add a line
else {
/*Send the FIN segment, but we can still read the socket*/
shutdown(s, SHUT_WR);
/* read result & check */
ret=http_read_line(s,header,MAXBUF-1);
In this case the shutdown function sends the TCP FIN and the server function can return and possibly then it would do a proper close.
And on a proper close, the FIN from the server will be received by the client. This would make the recv return 0, instead of getting blocked for ever.
Now if you want to continue any further data transfer from the client, you need to again connect or may be you need to have some different algorithm.
I hope my explanation may help fix the current problem.

How do I wake select() on a socket close?

I am currently using select loop to manage sockets in a proxy. One of the requirements of this proxy is that if the proxy sends a message to the outside server and does not get a response in a certain time, the proxy should close that socket and try to connect to a secondary server. The closing happens in a separate thread, while the select thread blocks waiting for activity.
I am having trouble figuring out how to detect that this socket closed specifically, so that I can handle the failure. If I call close() in the other thread, I get an EBADF, but I can't tell which socket closed. I tried to detect the socket through the exception fdset, thinking it would contain the closed socket, but I'm not getting anything returned there. I have also heard calling shutdown() will send a FIN to the server and receive a FIN back, so that I can close it; but the whole point is me trying to close this as a result of not getting a response within the timeout period, so I cant do that, either.
If my assumptions here are wrong, let me know. Any ideas would be appreciated.
EDIT:
In response to the suggestions about using select time out: I need to do the closing asynchronously, because the client connecting to the proxy will time out and I can't wait around for the select to be polled. This would only work if I made the select time out very small, which would then constantly be polling and wasting resources which I don't want.
Generally I just mark the socket for closing in the other thread, and then when select() returns from activity or timeout, I run a cleanup pass and close out all dead connections and update the fd_set. Doing it any other way opens you up to race conditions where you gave up on the connection, just as select() finally recognized some data for it, then you close it, but the other thread tries to process the data that was detected and gets upset to find the connection closed.
Oh, and poll() is generally better than select() in terms of not having to copy as much data around.
You cannot free a resource in one thread while another thread is or might be using it. Calling close on a socket that might be in use in another thread will never work right. There will always be potentially disastrous race conditions.
There are two good solutions to your problem:
Have the thread that calls select always use a timeout no greater than the longest you're willing to wait to process a timeout. When a timeout occurs, indicate that some place the thread that calls select will notice when it returns from select. Have that thread do the actual close of the socket in-between calls to select.
Have the thread that detects the timeout call shutdown on the socket. This will cause select to return and then have that thread do the close.
How to cope with EBADF on select():
int fopts = 0;
for (int i = 0; i < num_clients; ++i) {
if (fcntl(client[i].fd, F_GETFL, &fopts) < 0) {
// call close(), FD_CLR(), and remove i'th element from client list
}
}
This code assumes you have an array of client structures which have "fd" members for the socket descriptor. The fcntl() call checks whether the socket is still "alive", and if not, we do what we have to to remove the dead socket and its associated client info.
It's hard to comment when seeing only a small part of the elephant but maybe you are over complicating things?
Presumably you have some structure to keep track of each socket and its info (like time left to receive a reply). You can change the select() loop to use a timeout. Within it check whether it is time to close the socket. Do what you need to do for the close and don't add it to the fd sets the next time around.
If you use poll(2) as suggested in other answers, you can use the POLLNVAL status, which is essentially EBADF, but on a per-file-descriptor basis, not on the whole system call as it is for select(2).
Use a timeout for the select, and if the read-ready/write-ready/had-error sequences are all empty (w.r.t that socket), check if it was closed.
Just run a "test select" on every single socket that might have closed with a zero timeout and check the select result and errno until you found the one that has closed.
The following piece of demo code starts two server sockets on separate threads and creates two client sockets to connect to either server socket. Then it starts another thread, that will randomly kill one of the client sockets after 10 seconds (it will just close it). Closing either client socket causes select to fail with error in the main thread and the code below will now test which of the two sockets has actually closed.
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <stdint.h>
#include <pthread.h>
#include <stdbool.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <sys/select.h>
#include <sys/socket.h>
static void * serverThread ( void * threadArg )
{
int res;
int connSo;
int servSo;
socklen_t addrLen;
struct sockaddr_in soAddr;
uint16_t * port = threadArg;
servSo = socket(PF_INET, SOCK_STREAM, 0);
assert(servSo >= 0);
memset(&soAddr, 0, sizeof(soAddr));
soAddr.sin_family = AF_INET;
soAddr.sin_port = htons(*port);
// Uncommend line below if your system offers this field in the struct
// and also needs this field to be initialized correctly.
// soAddr.sin_len = sizeof(soAddr);
res = bind(servSo, (struct sockaddr *)&soAddr, sizeof(soAddr));
assert(res == 0);
res = listen(servSo, 10);
assert(res == 0);
addrLen = 0;
connSo = accept(servSo, NULL, &addrLen);
assert(connSo >= 0);
for (;;) {
char buffer[2048];
ssize_t bytesRead;
bytesRead = recv(connSo, buffer, sizeof(buffer), 0);
if (bytesRead <= 0) break;
printf("Received %zu bytes on port %d.\n", bytesRead, (int)*port);
}
free(port);
close(connSo);
close(servSo);
return NULL;
}
static void * killSocketIn10Seconds ( void * threadArg )
{
int * so = threadArg;
sleep(10);
printf("Killing socket %d.\n", *so);
close(*so);
free(so);
return NULL;
}
int main ( int argc, const char * const * argv )
{
int res;
int clientSo1;
int clientSo2;
int * socketArg;
uint16_t * portArg;
pthread_t killThread;
pthread_t serverThread1;
pthread_t serverThread2;
struct sockaddr_in soAddr;
// Create a server socket at port 19500
portArg = malloc(sizeof(*portArg));
assert(portArg != NULL);
*portArg = 19500;
res = pthread_create(&serverThread1, NULL, &serverThread, portArg);
assert(res == 0);
// Create another server socket at port 19501
portArg = malloc(sizeof(*portArg));
assert(portArg != NULL);
*portArg = 19501;
res = pthread_create(&serverThread1, NULL, &serverThread, portArg);
assert(res == 0);
// Create two client sockets, one for 19500 and one for 19501
// and connect both to the server sockets we created above.
clientSo1 = socket(PF_INET, SOCK_STREAM, 0);
assert(clientSo1 >= 0);
clientSo2 = socket(PF_INET, SOCK_STREAM, 0);
assert(clientSo2 >= 0);
memset(&soAddr, 0, sizeof(soAddr));
soAddr.sin_family = AF_INET;
soAddr.sin_port = htons(19500);
res = inet_pton(AF_INET, "127.0.0.1", &soAddr.sin_addr);
assert(res == 1);
// Uncommend line below if your system offers this field in the struct
// and also needs this field to be initialized correctly.
// soAddr.sin_len = sizeof(soAddr);
res = connect(clientSo1, (struct sockaddr *)&soAddr, sizeof(soAddr));
assert(res == 0);
soAddr.sin_port = htons(19501);
res = connect(clientSo2, (struct sockaddr *)&soAddr, sizeof(soAddr));
assert(res == 0);
// We want either client socket to be closed locally after 10 seconds.
// Which one is random, so try running test app multiple times.
socketArg = malloc(sizeof(*socketArg));
srandomdev();
*socketArg = (random() % 2 == 0 ? clientSo1 : clientSo2);
res = pthread_create(&killThread, NULL, &killSocketIn10Seconds, socketArg);
assert(res == 0);
for (;;) {
int ndfs;
int count;
fd_set readSet;
// ndfs must be the highest socket number + 1
ndfs = (clientSo2 > clientSo1 ? clientSo2 : clientSo1);
ndfs++;
FD_ZERO(&readSet);
FD_SET(clientSo1, &readSet);
FD_SET(clientSo2, &readSet);
// No timeout, that means select may block forever here.
count = select(ndfs, &readSet, NULL, NULL, NULL);
// Without a timeout count should never be zero.
// Zero is only returned if select ran into the timeout.
assert(count != 0);
if (count < 0) {
int error = errno;
printf("Select terminated with error: %s\n", strerror(error));
if (error == EBADF) {
fd_set closeSet;
struct timeval atonce;
FD_ZERO(&closeSet);
FD_SET(clientSo1, &closeSet);
memset(&atonce, 0, sizeof(atonce));
count = select(clientSo1 + 1, &closeSet, NULL, NULL, &atonce);
if (count == -1 && errno == EBADF) {
printf("Socket 1 (%d) closed.\n", clientSo1);
break; // Terminate test app
}
FD_ZERO(&closeSet);
FD_SET(clientSo2, &closeSet);
// Note: Standard requires you to re-init timeout for every
// select call, you must never rely that select has not changed
// its value in any way, not even if its all zero.
memset(&atonce, 0, sizeof(atonce));
count = select(clientSo2 + 1, &closeSet, NULL, NULL, &atonce);
if (count == -1 && errno == EBADF) {
printf("Socket 2 (%d) closed.\n", clientSo2);
break; // Terminate test app
}
}
}
}
// Be a good citizen, close all sockets, join all threads
close(clientSo1);
close(clientSo2);
pthread_join(killThread, NULL);
pthread_join(serverThread1, NULL);
pthread_join(serverThread2, NULL);
return EXIT_SUCCESS;
}
Sample output for running this test code twice:
$ ./sockclose
Killing socket 3.
Select terminated with error: Bad file descriptor
Socket 1 (3) closed.
$ ./sockclose
Killing socket 4.
Select terminated with error: Bad file descriptor
Socket 1 (4) closed.
However, if your system supports poll(), I would strongly advise you to consider using this API instead of select(). Select is a rather ugly, legacy API from the past, only left there for backward compatibility with existing code. Poll has a much better interface for this task and it has an extra flag to directly signal you that a socket has closed locally: POLLNVAL will be set on revents if this socket has been closed, regardless which flags you requested on events, since POLLNVAL is an output only flags, that means it is ignored when being set on events. If the socket was not closed locally but the remote server has just closed the connection, the flag POLLHUP will be set in revents (also an output only flag). Another advantage of poll is that the timeout is simply an int value (milliseconds, fine grained enough for real network sockets) and that there are no limitations to the number of sockets that can be monitored or their numeric value range.

Resources