How do I wake select() on a socket close? - linux

I am currently using select loop to manage sockets in a proxy. One of the requirements of this proxy is that if the proxy sends a message to the outside server and does not get a response in a certain time, the proxy should close that socket and try to connect to a secondary server. The closing happens in a separate thread, while the select thread blocks waiting for activity.
I am having trouble figuring out how to detect that this socket closed specifically, so that I can handle the failure. If I call close() in the other thread, I get an EBADF, but I can't tell which socket closed. I tried to detect the socket through the exception fdset, thinking it would contain the closed socket, but I'm not getting anything returned there. I have also heard calling shutdown() will send a FIN to the server and receive a FIN back, so that I can close it; but the whole point is me trying to close this as a result of not getting a response within the timeout period, so I cant do that, either.
If my assumptions here are wrong, let me know. Any ideas would be appreciated.
EDIT:
In response to the suggestions about using select time out: I need to do the closing asynchronously, because the client connecting to the proxy will time out and I can't wait around for the select to be polled. This would only work if I made the select time out very small, which would then constantly be polling and wasting resources which I don't want.

Generally I just mark the socket for closing in the other thread, and then when select() returns from activity or timeout, I run a cleanup pass and close out all dead connections and update the fd_set. Doing it any other way opens you up to race conditions where you gave up on the connection, just as select() finally recognized some data for it, then you close it, but the other thread tries to process the data that was detected and gets upset to find the connection closed.
Oh, and poll() is generally better than select() in terms of not having to copy as much data around.

You cannot free a resource in one thread while another thread is or might be using it. Calling close on a socket that might be in use in another thread will never work right. There will always be potentially disastrous race conditions.
There are two good solutions to your problem:
Have the thread that calls select always use a timeout no greater than the longest you're willing to wait to process a timeout. When a timeout occurs, indicate that some place the thread that calls select will notice when it returns from select. Have that thread do the actual close of the socket in-between calls to select.
Have the thread that detects the timeout call shutdown on the socket. This will cause select to return and then have that thread do the close.

How to cope with EBADF on select():
int fopts = 0;
for (int i = 0; i < num_clients; ++i) {
if (fcntl(client[i].fd, F_GETFL, &fopts) < 0) {
// call close(), FD_CLR(), and remove i'th element from client list
}
}
This code assumes you have an array of client structures which have "fd" members for the socket descriptor. The fcntl() call checks whether the socket is still "alive", and if not, we do what we have to to remove the dead socket and its associated client info.

It's hard to comment when seeing only a small part of the elephant but maybe you are over complicating things?
Presumably you have some structure to keep track of each socket and its info (like time left to receive a reply). You can change the select() loop to use a timeout. Within it check whether it is time to close the socket. Do what you need to do for the close and don't add it to the fd sets the next time around.

If you use poll(2) as suggested in other answers, you can use the POLLNVAL status, which is essentially EBADF, but on a per-file-descriptor basis, not on the whole system call as it is for select(2).

Use a timeout for the select, and if the read-ready/write-ready/had-error sequences are all empty (w.r.t that socket), check if it was closed.

Just run a "test select" on every single socket that might have closed with a zero timeout and check the select result and errno until you found the one that has closed.
The following piece of demo code starts two server sockets on separate threads and creates two client sockets to connect to either server socket. Then it starts another thread, that will randomly kill one of the client sockets after 10 seconds (it will just close it). Closing either client socket causes select to fail with error in the main thread and the code below will now test which of the two sockets has actually closed.
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <stdint.h>
#include <pthread.h>
#include <stdbool.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <sys/select.h>
#include <sys/socket.h>
static void * serverThread ( void * threadArg )
{
int res;
int connSo;
int servSo;
socklen_t addrLen;
struct sockaddr_in soAddr;
uint16_t * port = threadArg;
servSo = socket(PF_INET, SOCK_STREAM, 0);
assert(servSo >= 0);
memset(&soAddr, 0, sizeof(soAddr));
soAddr.sin_family = AF_INET;
soAddr.sin_port = htons(*port);
// Uncommend line below if your system offers this field in the struct
// and also needs this field to be initialized correctly.
// soAddr.sin_len = sizeof(soAddr);
res = bind(servSo, (struct sockaddr *)&soAddr, sizeof(soAddr));
assert(res == 0);
res = listen(servSo, 10);
assert(res == 0);
addrLen = 0;
connSo = accept(servSo, NULL, &addrLen);
assert(connSo >= 0);
for (;;) {
char buffer[2048];
ssize_t bytesRead;
bytesRead = recv(connSo, buffer, sizeof(buffer), 0);
if (bytesRead <= 0) break;
printf("Received %zu bytes on port %d.\n", bytesRead, (int)*port);
}
free(port);
close(connSo);
close(servSo);
return NULL;
}
static void * killSocketIn10Seconds ( void * threadArg )
{
int * so = threadArg;
sleep(10);
printf("Killing socket %d.\n", *so);
close(*so);
free(so);
return NULL;
}
int main ( int argc, const char * const * argv )
{
int res;
int clientSo1;
int clientSo2;
int * socketArg;
uint16_t * portArg;
pthread_t killThread;
pthread_t serverThread1;
pthread_t serverThread2;
struct sockaddr_in soAddr;
// Create a server socket at port 19500
portArg = malloc(sizeof(*portArg));
assert(portArg != NULL);
*portArg = 19500;
res = pthread_create(&serverThread1, NULL, &serverThread, portArg);
assert(res == 0);
// Create another server socket at port 19501
portArg = malloc(sizeof(*portArg));
assert(portArg != NULL);
*portArg = 19501;
res = pthread_create(&serverThread1, NULL, &serverThread, portArg);
assert(res == 0);
// Create two client sockets, one for 19500 and one for 19501
// and connect both to the server sockets we created above.
clientSo1 = socket(PF_INET, SOCK_STREAM, 0);
assert(clientSo1 >= 0);
clientSo2 = socket(PF_INET, SOCK_STREAM, 0);
assert(clientSo2 >= 0);
memset(&soAddr, 0, sizeof(soAddr));
soAddr.sin_family = AF_INET;
soAddr.sin_port = htons(19500);
res = inet_pton(AF_INET, "127.0.0.1", &soAddr.sin_addr);
assert(res == 1);
// Uncommend line below if your system offers this field in the struct
// and also needs this field to be initialized correctly.
// soAddr.sin_len = sizeof(soAddr);
res = connect(clientSo1, (struct sockaddr *)&soAddr, sizeof(soAddr));
assert(res == 0);
soAddr.sin_port = htons(19501);
res = connect(clientSo2, (struct sockaddr *)&soAddr, sizeof(soAddr));
assert(res == 0);
// We want either client socket to be closed locally after 10 seconds.
// Which one is random, so try running test app multiple times.
socketArg = malloc(sizeof(*socketArg));
srandomdev();
*socketArg = (random() % 2 == 0 ? clientSo1 : clientSo2);
res = pthread_create(&killThread, NULL, &killSocketIn10Seconds, socketArg);
assert(res == 0);
for (;;) {
int ndfs;
int count;
fd_set readSet;
// ndfs must be the highest socket number + 1
ndfs = (clientSo2 > clientSo1 ? clientSo2 : clientSo1);
ndfs++;
FD_ZERO(&readSet);
FD_SET(clientSo1, &readSet);
FD_SET(clientSo2, &readSet);
// No timeout, that means select may block forever here.
count = select(ndfs, &readSet, NULL, NULL, NULL);
// Without a timeout count should never be zero.
// Zero is only returned if select ran into the timeout.
assert(count != 0);
if (count < 0) {
int error = errno;
printf("Select terminated with error: %s\n", strerror(error));
if (error == EBADF) {
fd_set closeSet;
struct timeval atonce;
FD_ZERO(&closeSet);
FD_SET(clientSo1, &closeSet);
memset(&atonce, 0, sizeof(atonce));
count = select(clientSo1 + 1, &closeSet, NULL, NULL, &atonce);
if (count == -1 && errno == EBADF) {
printf("Socket 1 (%d) closed.\n", clientSo1);
break; // Terminate test app
}
FD_ZERO(&closeSet);
FD_SET(clientSo2, &closeSet);
// Note: Standard requires you to re-init timeout for every
// select call, you must never rely that select has not changed
// its value in any way, not even if its all zero.
memset(&atonce, 0, sizeof(atonce));
count = select(clientSo2 + 1, &closeSet, NULL, NULL, &atonce);
if (count == -1 && errno == EBADF) {
printf("Socket 2 (%d) closed.\n", clientSo2);
break; // Terminate test app
}
}
}
}
// Be a good citizen, close all sockets, join all threads
close(clientSo1);
close(clientSo2);
pthread_join(killThread, NULL);
pthread_join(serverThread1, NULL);
pthread_join(serverThread2, NULL);
return EXIT_SUCCESS;
}
Sample output for running this test code twice:
$ ./sockclose
Killing socket 3.
Select terminated with error: Bad file descriptor
Socket 1 (3) closed.
$ ./sockclose
Killing socket 4.
Select terminated with error: Bad file descriptor
Socket 1 (4) closed.
However, if your system supports poll(), I would strongly advise you to consider using this API instead of select(). Select is a rather ugly, legacy API from the past, only left there for backward compatibility with existing code. Poll has a much better interface for this task and it has an extra flag to directly signal you that a socket has closed locally: POLLNVAL will be set on revents if this socket has been closed, regardless which flags you requested on events, since POLLNVAL is an output only flags, that means it is ignored when being set on events. If the socket was not closed locally but the remote server has just closed the connection, the flag POLLHUP will be set in revents (also an output only flag). Another advantage of poll is that the timeout is simply an int value (milliseconds, fine grained enough for real network sockets) and that there are no limitations to the number of sockets that can be monitored or their numeric value range.

Related

Linux multi-thread, pausing one thread while continue running the other threads within the same process

I cannot find a proper solution to my problem.
If i have more than one thread in one process. And I want to make only one thread to sleep while running the other threads within the same process, is there any predefined syntax for it or do i have to do my own implementation (sleep) ?
Ideally i want to send a indication from a thread to another thread when it is time for sleep.
Edited (2015-08-24)
I have two main threads, one for sending data over a network, the other receives the data from the network. Beside jitter, the receiving thread does validation and verification and some file management which in time could lead that it will drag behind. What i like to do is to add something like a micro sleep to the sender so that the receiver could catch up. sched_yield() will not help in this case because the HW has a multi core CPU with more than 40 cores.
From your description in the comments, it looks like you're trying to synchronize 2 threads so that one of them doesn't fall behind too far from the other.
If that's the case, you're going about this the wrong way. It is seldom a good idea to do synchronization by sleeping, because the scheduler may incur unpredictable and long delays that cause the other (slow) thread to remain stopped in the run queue without being scheduled. Even if it works most of the time, it's still a race condition, and it's an ugly hack.
Given your use case and constraints, I think you'd be better off using barriers (see pthread_barrier_init(3)). Pthread barriers allow you to create a rendezvous point in the code where threads can catch up.
You call pthread_barrier_init(3) as part of the initialization code, specifying the number of threads that will be synchronized using that barrier. In this case, it's 2.
Then, threads synchronize with others by calling pthread_barrier_wait(3). The call blocks until the number of threads specified in pthread_barrier_init(3) call pthread_barrier_wait(3), at which point every thread that was blocked in pthread_barrier_wait(3) becomes runnable and the cycle begins again. Essentially, barriers create a synchronization point where no one can move forward until everyone arrives. I think this is exactly what you're looking for.
Here's an example that simulates a fast sender thread and a slow receiver thread. They both synchronize with barriers to ensure that the sender does not do any work while the receiver is still processing other requests. The threads synchronize at the end of their work unit, but of course, you can choose where each thread calls pthread_barrier_wait(3), thereby controlling exactly when (and where) threads synchronize.
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
pthread_barrier_t barrier;
void *sender_thr(void *arg) {
printf("Entered sender thread\n");
int i;
for (i = 0; i < 10; i++) {
/* Simulate some work (500 ms) */
if (usleep(500000) < 0) {
perror("usleep(3) error");
}
printf("Sender thread synchronizing.\n");
/* Wait for receiver to catch up */
int barrier_res = pthread_barrier_wait(&barrier);
if (barrier_res == PTHREAD_BARRIER_SERIAL_THREAD)
printf("Sender thread was last.\n");
else if (barrier_res == 0)
printf("Sender thread was first.\n");
else
fprintf(stderr, "pthread_barrier_wait(3) error on sender: %s\n", strerror(barrier_res));
}
return NULL;
}
void *receiver_thr(void *arg) {
printf("Entered receiver thread\n");
int i;
for (i = 0; i < 10; i++) {
/* Simulate a lot of work */
if (usleep(2000000) < 0) {
perror("usleep(3) error");
}
printf("Receiver thread synchronizing.\n");
/* Catch up with sender */
int barrier_res = pthread_barrier_wait(&barrier);
if (barrier_res == PTHREAD_BARRIER_SERIAL_THREAD)
printf("Receiver thread was last.\n");
else if (barrier_res == 0)
printf("Receiver thread was first.\n");
else
fprintf(stderr, "pthread_barrier_wait(3) error on receiver: %s\n", strerror(barrier_res));
}
return NULL;
}
int main(void) {
int barrier_res;
if ((barrier_res = pthread_barrier_init(&barrier, NULL, 2)) != 0) {
fprintf(stderr, "pthread_barrier_init(3) error: %s\n", strerror(barrier_res));
exit(EXIT_FAILURE);
}
pthread_t threads[2];
int thread_res;
if ((thread_res = pthread_create(&threads[0], NULL, sender_thr, NULL)) != 0) {
fprintf(stderr, "pthread_create(3) error on sender thread: %s\n", strerror(thread_res));
exit(EXIT_FAILURE);
}
if ((thread_res = pthread_create(&threads[1], NULL, receiver_thr, NULL)) != 0) {
fprintf(stderr, "pthread_create(3) error on receiver thread: %s\n", strerror(thread_res));
exit(EXIT_FAILURE);
}
/* Do some work... */
if ((thread_res = pthread_join(threads[0], NULL)) != 0) {
fprintf(stderr, "pthread_join(3) error on sender thread: %s\n", strerror(thread_res));
exit(EXIT_FAILURE);
}
if ((thread_res = pthread_join(threads[1], NULL)) != 0) {
fprintf(stderr, "pthread_join(3) error on receiver thread: %s\n", strerror(thread_res));
exit(EXIT_FAILURE);
}
if ((barrier_res = pthread_barrier_destroy(&barrier)) != 0) {
fprintf(stderr, "pthread_barrier_destroy(3) error: %s\n", strerror(barrier_res));
exit(EXIT_FAILURE);
}
return 0;
}
Note that, as specified in the manpage for pthread_barrier_wait(3), once the desired number of threads call pthread_barrier_wait(3), the barrier state is reset to the original state that was in use after the last call to pthread_barrier_init(3), which means that the barrier atomically unlocks and resets state, so it is always ready for the next synchronization point, which is wonderful.
Once you're done with the barrier, don't forget to free the associated resources with pthread_barrier_destroy(3).

Differences in poll() between Linux and OS X when pollfd is changed on another thread

I'm trying to get libwebsockets running in a multithreaded environment on OS X. I couldn't trigger sending Data from a different thread than the main service thread. On libwebsocket docs it was implied this should be possible (demo code, mailinglist). So I dug into the code and found the problem in the poll() function.
It seems that poll() is behaving differently concerning the struct pollfd that is given as parameter. libwebsockets is relying on the possibility to change the fds.event fields while poll() is active. This is working fine on Linux but is not working on OS X.
I wrote a small test program to demonstrate the behaviour:
#include <unistd.h>
#include <netdb.h>
#include <poll.h>
#include <iostream>
#include <thread>
#define PORT "3490"
struct pollfd fds[1];
bool connected = false;
void main_loop() {
int sockfd, new_fd;
struct addrinfo hints, *servinfo, *p;
socklen_t sin_size;
int yes=1;
char s[INET6_ADDRSTRLEN];
int rv;
memset(&hints, 0, sizeof hints);
hints.ai_family = AF_INET;
hints.ai_socktype = SOCK_STREAM;
hints.ai_flags = AI_PASSIVE;
if ((rv = getaddrinfo(NULL, PORT, &hints, &servinfo)) != 0) {
fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(rv));
return;
}
for(p = servinfo; p != NULL; p = p->ai_next) {
if ((sockfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) == -1) {
perror("server: socket");
continue;
}
if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)) == -1) {
perror("setsockopt");
exit(1);
}
if (bind(sockfd, p->ai_addr, p->ai_addrlen) == -1) {
close(sockfd);
perror("server: bind");
continue;
}
break;
}
freeaddrinfo(servinfo);
if (p == NULL) {
fprintf(stderr, "server: failed to bind\n");
exit(1);
}
if (listen(sockfd, 10) == -1) {
perror("listen");
exit(1);
}
printf("server: waiting for connections...\n");
new_fd = accept(sockfd, NULL, &sin_size);
if (new_fd == -1) {
perror("accept");
return;
}
fds[0].fd = new_fd;
fds[0].events = POLLIN;
connected = true;
printf("event is %i\n", fds[0].events);
int ret = poll(fds, 1, 5000);
printf("event is %i\n", fds[0].events); //expecting 1 on Mac and 5 on Linux
if (send(new_fd, "Hello, world!\n", 14, 0) == -1)
perror("send");
close(new_fd);
close(sockfd);
}
void second_thread()
{
while(connected == false){}
sleep(1);
fds[0].events = POLLIN|POLLOUT;
printf("set event to %i\n", fds[0].events);
}
int main() {
std::thread t1(main_loop);
std::thread t2(second_thread);
t1.join();
t2.join();
return 0;
}
Compile on OS X using clang++ -std=c++11 -stdlib=libc++ -o poll poll.cpp
and on Linux using g++ -std=c++11 -pthread -o poll poll.cpp
The program starts listening on port 3490. If you connect to it (e.g. using netcat localhost 3490) it will poll for input on the main thread and try to change the event flags in the second thread. It will exit after 5 seconds.
The output on OS X:
server: waiting for connections...
event is 1
set event to 5
event is 1
The output on Linux:
server: waiting for connections...
event is 1
set event to 5
event is 5
So my question is: is there any documentation available that explains this behavior? Is it safe what libwebsockets is doing in expecting that it is legal to change fds.events while poll is active? I couldn't find any details about it in the manpages (OS X, Linux).
You seem to say, at first, that you found some documentation that claims that this is supported and defined behavior. I'd be curious to know where you read that, because I am unable to find anything in either the Linux man page for poll(2), nor in the POSIX man page for poll() that documents that a different thread can actually change the values in the event array argument that another thread passed to poll(), and have the different thread's changes actually take effect in the original thread's poll() call, irrespective of any issues relating to memory barriers, and such.
Both man pages appear to be completely silent, to me, on this subject matter. They do not indicate whether this is expected, supported, or defined behavior; or whether this is not a supported or defined behavior.
The proposition that a different thread can modify the parameters to a system call issued by another thread, after -- AFTER -- the other thread has already entered the syscall, seems rather counter-intertuitive to me. If this is supported behavior, I would expect it to be explicitly documented, and I can't find any reference to it in the Linux or the POSIX man pages.
Having said that: even if I limit the scope of my software to Linux, even if I don't need to care about other platforms; given the absence of any documentation of this, and even if my testing showed the Linux kernel implementing poll(2) this way, I would not expect to have any guarantees that some future kernel version will continue to behave this way. I would not be able to rely on this behavior, except on the specific kernel build I tested this with.
So, to answer your question: the only documentation that's authoritative on this topic are the man pages in question. They do not explicitly document this as legal behavior; and although they do not explicitly say that this is illegal behavior either, for the reasons stated above, I would consider this to be unsupported, undefined behavior.

Update value after passing pointer

I am using a TCP server to send a char array. The function send() takes a char *, but, before that, it has to listen and accept a connection. Given that, I want to send the most recent data when an incoming connection is accepted. Previously, I used two threads. One updated the value in the buffer, the other simply waited for connections, then sent data.
I understand that there can be problems with not locking a mutex, but aside from that, would this same scheme work if I passed the char * to a send function, rather than updating it as a global variable?
Some code to demonstrate:
#include <pthread.h>
char buf[BUFLEN];
void *updateBuffer(void *arg) {
while(true) {
getNewData(buf);
}
}
void *sendData(void *arg) {
//Setup socket
while(true) {
newfd = accept(sockfd, (struct sockaddr *)&their_addr, &size);
send(newfd, buf, BUFLEN, 0);
close(newfd);
}
}
This would send the updated values whenever a new connection was established.
I want to try this:
#include <pthread.h>
char buf[BUFLEN];
void *updateBuffer(void *arg) {
while(true) {
getNewData(buf);
}
}
void *sendData(void *arg) {
TCPServer tcpServer;
while(true) {
tcpServer.send(buf);
}
}
Where the function tcpServer.send(char *) is basically the same as sendData() above.
The reason for doing this is so that I can make the TCP server into a class, since I'll need to use the same code elsewhere.
From my understanding, since I am passing the pointer, it's basically the same as when I just call send(), since I also pass a pointer there. The value will continue to update, but the address won't change, so it should work. Please let me know if that is correct. I'm also open to new ways of doing this (without mutex locks, preferably).
Yes, that is the way most of us do a send, pass a pointer to a buffer either void * or char *
I would coded like this:
int sendData(const char * buffer, const int length)
{
Socket newfd;
Int NumOfConnects=0;
while ((newfd=accept(sockfd, (struct sockaddr *)&their_addr, &size)) > 0)
{
// It would be necessary here to lock the buffer with a Mutex
send(newfd, buffer, length, 0);
// Release the Mutex
close(newfd);
NumOfConnects++;
}
// there is an error in the accept
// this could be OK,
// if the main thread has closed the sockfd socket indicating us to quit.
// returns the number of transfers we have done.
return NumOfConnects;
}
One thing to consider about using a pointer to a buffer which is modify in another thread; Could it be that in the middle of a send the buffer changes and the data sent is not accurate.
But that situation you've already noticed as well. Using a Mutex is suggested as you indicated.

Simultaneous socket read/write ("full-duplex") in Linux (aio specifically)

I'm porting an application built on top of the ACE Proactor framework. The application runs perfectly for both VxWorks and Windows, but fails to do so on Linux (CentOS 5.5, WindRiver Linux 1.4 & 3.0) with kernel 2.6.X.X - using librt.
I've narrowed the problem down to a very basic issue:
The application begins an asynchronous (via aio_read) read operation on a socket and subsequently begins an asynchronous (via aio_write) write on the very same socket. The read operation cannot be fulfilled yet since the protocol is initialized from the application's end.
- When the socket is in blocking-mode, the write is never reached and the protocol "hangs".
- When using a O_NONBLOCK socket, the write succeeds but the read returns indefinitely with a "EWOULDBLOCK/EAGAIN" error, never to recover (even if the AIO operation is restarted).
I went through multiple forums and could not find a definitive answer to whether this should work (and I'm doing something wrong) or impossible with Linux AIO. Is it possible if I drop the AIO and seek a different implementation (via epoll/poll/select etc.)?
Attached is a sample code to quickly re-produce the problem on a non-blocking socket:
#include <aio.h>
#include <stdio.h>
#include <stdlib.h>
#include <netdb.h>
#include <string.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <assert.h>
#include <errno.h>
#define BUFSIZE (100)
// Global variables
struct aiocb *cblist[2];
int theSocket;
void InitializeAiocbData(struct aiocb* pAiocb, char* pBuffer)
{
bzero( (char *)pAiocb, sizeof(struct aiocb) );
pAiocb->aio_fildes = theSocket;
pAiocb->aio_nbytes = BUFSIZE;
pAiocb->aio_offset = 0;
pAiocb->aio_buf = pBuffer;
}
void IssueReadOperation(struct aiocb* pAiocb, char* pBuffer)
{
InitializeAiocbData(pAiocb, pBuffer);
int ret = aio_read( pAiocb );
assert (ret >= 0);
}
void IssueWriteOperation(struct aiocb* pAiocb, char* pBuffer)
{
InitializeAiocbData(pAiocb, pBuffer);
int ret = aio_write( pAiocb );
assert (ret >= 0);
}
int main()
{
int ret;
int nPort = 11111;
char* szServer = "10.10.9.123";
// Connect to the remote server
theSocket = socket(AF_INET, SOCK_STREAM, 0);
assert (theSocket >= 0);
struct hostent *pServer;
struct sockaddr_in serv_addr;
pServer = gethostbyname(szServer);
bzero((char *) &serv_addr, sizeof(serv_addr));
serv_addr.sin_family = AF_INET;
serv_addr.sin_port = htons(nPort);
bcopy((char *)pServer->h_addr, (char *)&serv_addr.sin_addr.s_addr, pServer->h_length);
assert (connect(theSocket, (const sockaddr*)(&serv_addr), sizeof(serv_addr)) >= 0);
// Set the socket to be non-blocking
int oldFlags = fcntl(theSocket, F_GETFL) ;
int newFlags = oldFlags | O_NONBLOCK;
fcntl(theSocket, F_SETFL, newFlags);
printf("Socket flags: before=%o, after=%o\n", oldFlags, newFlags);
// Construct the AIO callbacks array
struct aiocb my_aiocb1, my_aiocb2;
char* pBuffer = new char[BUFSIZE+1];
bzero( (char *)cblist, sizeof(cblist) );
cblist[0] = &my_aiocb1;
cblist[1] = &my_aiocb2;
// Start the read and write operations on the same socket
IssueReadOperation(&my_aiocb1, pBuffer);
IssueWriteOperation(&my_aiocb2, pBuffer);
// Wait for I/O completion on both operations
int nRound = 1;
printf("\naio_suspend round #%d:\n", nRound++);
ret = aio_suspend( cblist, 2, NULL );
assert (ret == 0);
// Check the error status for the read and write operations
ret = aio_error(&my_aiocb1);
assert (ret == EWOULDBLOCK);
// Get the return code for the read
{
ssize_t retcode = aio_return(&my_aiocb1);
printf("First read operation results: aio_error=%d, aio_return=%d - That's the first EWOULDBLOCK\n", ret, retcode);
}
ret = aio_error(&my_aiocb2);
assert (ret == EINPROGRESS);
printf("Write operation is still \"in progress\"\n");
// Re-issue the read operation
IssueReadOperation(&my_aiocb1, pBuffer);
// Wait for I/O completion on both operations
printf("\naio_suspend round #%d:\n", nRound++);
ret = aio_suspend( cblist, 2, NULL );
assert (ret == 0);
// Check the error status for the read and write operations for the second time
ret = aio_error(&my_aiocb1);
assert (ret == EINPROGRESS);
printf("Second read operation request is suddenly marked as \"in progress\"\n");
ret = aio_error(&my_aiocb2);
assert (ret == 0);
// Get the return code for the write
{
ssize_t retcode = aio_return(&my_aiocb2);
printf("Write operation has completed with results: aio_error=%d, aio_return=%d\n", ret, retcode);
}
// Now try waiting for the read operation to complete - it'll just busy-wait, receiving "EWOULDBLOCK" indefinitely
do
{
printf("\naio_suspend round #%d:\n", nRound++);
ret = aio_suspend( cblist, 1, NULL );
assert (ret == 0);
// Check the error of the read operation and re-issue if needed
ret = aio_error(&my_aiocb1);
if (ret == EWOULDBLOCK)
{
IssueReadOperation(&my_aiocb1, pBuffer);
printf("EWOULDBLOCK again on the read operation!\n");
}
}
while (ret == EWOULDBLOCK);
}
Thanks in advance,
Yotam.
Firstly, O_NONBLOCK and AIO don't mix. AIO will report the asynchronous operation complete when the corresponding read or write wouldn't have blocked - and with O_NONBLOCK, they would never block, so the aio request will always complete immediately (with aio_return() giving EWOULDBLOCK).
Secondly, don't use the same buffer for two simultaneous outstanding aio requests. The buffer should be considered completely offlimits between the time when the aio request was issued and when aio_error() tells you that it has completed.
Thirdly, AIO requests to the same file descriptor are queued, in order to give sensible results. This means that your write won't happen until the read completes - if you need to write the data first, you need to issue the AIOs in the opposite order. The following will work fine, without setting O_NONBLOCK:
struct aiocb my_aiocb1, my_aiocb2;
char pBuffer1[BUFSIZE+1], pBuffer2[BUFSIZE+1] = "Some test message";
const struct aiocb *cblist[2] = { &my_aiocb1, &my_aiocb2 };
// Start the read and write operations on the same socket
IssueWriteOperation(&my_aiocb2, pBuffer2);
IssueReadOperation(&my_aiocb1, pBuffer1);
// Wait for I/O completion on both operations
int nRound = 1;
int aio_status1, aio_status2;
do {
printf("\naio_suspend round #%d:\n", nRound++);
ret = aio_suspend( cblist, 2, NULL );
assert (ret == 0);
// Check the error status for the read and write operations
aio_status1 = aio_error(&my_aiocb1);
if (aio_status1 == EINPROGRESS)
puts("aio1 still in progress.");
else
puts("aio1 completed.");
aio_status2 = aio_error(&my_aiocb2);
if (aio_status2 == EINPROGRESS)
puts("aio2 still in progress.");
else
puts("aio2 completed.");
} while (aio_status1 == EINPROGRESS || aio_status2 == EINPROGRESS);
// Get the return code for the read
ssize_t retcode;
retcode = aio_return(&my_aiocb1);
printf("First operation results: aio_error=%d, aio_return=%d\n", aio_status1, retcode);
retcode = aio_return(&my_aiocb1);
printf("Second operation results: aio_error=%d, aio_return=%d\n", aio_status1, retcode);
Alternatively, if you don't care about reads and writes being ordered with respect to each other, you can use dup() to create two file descriptors for the socket, and use one for reading and the other for writing - each will have its AIO operations queued separately.

socket() returns 0 in C client server application

I'm working on an application that contains several server sockets that each run in a unique thread.
An external utility (script) is called by one of the threads. This script calls a utility (client) that sends a message to one of the server sockets.
Initially, I was using system() to execute this external script, but we couldn't use that because we had to make sure the server sockets were closed in the child that was forked to execute the external script.
I now call fork() and execvp() myself. I fork() and then in the child I close all the server sockets and then call execvp() to execute the script.
Now, all of that works fine. The problem is that at times the script reports errors to the server app. The script sends these errors by calling another application (client) which opens a TCP socket and sends the appropriate data. My issue is that the client app gets a value of 0 returned by the socket() system call.
NOTE: This ONLY occurs when the script/client app is called using my forkExec() function. If the script/client app is called manually the socket() call performs appropriately and things work fine.
Based on that information I suspect it's something in my fork() execvp() code below... Any ideas?
void forkExec()
{
int stat;
stat = fork();
if (stat < 0)
{
printf("Error forking child: %s", strerror(errno));
}
else if (stat == 0)
{
char *progArgs[3];
/*
* First, close the file descriptors that the child
* shouldn't keep open
*/
close(ServerFd);
close(XMLSocket);
close(ClientFd);
close(EventSocket);
close(monitorSocket);
/* build the arguments for script */
progArgs[0] = calloc(1, strlen("/path_to_script")+1);
strcpy(progArgs[0], "/path_to_script");
progArgs[1] = calloc(1, strlen(arg)+1);
strcpy(progArgs[1], arg);
progArgs[2] = NULL; /* Array of args must be NULL terminated for execvp() */
/* launch the script */
stat = execvp(progArgs[0], progArgs);
if (stat != 0)
{
printf("Error executing script: '%s' '%s' : %s", progArgs[0], progArgs[1], strerror(errno));
}
free(progArgs[0]);
free(progArgs[1]);
exit(0);
}
return;
}
Client app code:
static int connectToServer(void)
{
int socketFD = 0;
int status;
struct sockaddr_in address;
struct hostent* hostAddr = gethostbyname("localhost");
socketFD = socket(PF_INET, SOCK_STREAM, 0);
The above call returns 0.
if (socketFD < 0)
{
fprintf(stderr, "%s-%d: Failed to create socket: %s",
__func__, __LINE__, strerror(errno));
return (-1);
}
memset(&address, 0, sizeof(struct sockaddr));
address.sin_family = AF_INET;
memcpy(&(address.sin_addr.s_addr), hostAddr->h_addr, hostAddr->h_length);
address.sin_port = htons(POLLING_SERVER_PORT);
status = connect(socketFD, (struct sockaddr *)&address, sizeof(address));
if (status < 0)
{
if (errno != ECONNREFUSED)
{
fprintf(stderr, "%s-%d: Failed to connect to server socket: %s",
__func__, __LINE__, strerror(errno));
}
else
{
fprintf(stderr, "%s-%d: Server not yet available...%s",
__func__, __LINE__, strerror(errno));
close(socketFD);
socketFD = 0;
}
}
return socketFD;
}
FYI
OS: Linux
Arch: ARM32
Kernel: 2.6.26
socket() returns -1 on error.
A return of 0 means socket() succeeded and gave you file descriptor 0. I suspect that one of the file descriptors that you close has file descriptor 0 and once it's closed the next call to a function that allocated a file descriptor will return fd 0 as it's available.
A socket with value 0 is fine, it means stdin was closed which will make fd 0 available for reuse - such as by a socket.
chances are one of the filedescriptors you close in the forkExec() child path(XMLSocket/ServerFd) etc.) was fd 0 . That'll start the child with fd 0 closed, which won't happen when you run the app from a command line, as fd 0 will be already open as the stdin of the shell.
If you want your socket to not be 0,1 or 2 (stdin/out/err) call the following in your forkExec() function after all the close() calls
void reserve_tty()
{
int fd;
for(fd=0; fd < 3; fd++)
int nfd;
nfd = open("/dev/null", O_RDWR);
if(nfd<0) /* We're screwed. */
continue;
if(nfd==fd)
continue;
dup2(nfd, fd);
if(nfd > 2)
close(nfd);
}
Check for socket returning -1 which means an error occured.
Don't forget a call to
waitpid()
End of "obvious question mode". I'm assuming a bit here but you're not doing anything with the pid returned by the fork() call. (-:
As it is mentioned in another comment, you really should not close 0,1 or 2 (stdin/out/err), you can put a check to make sure you do not close those and so it will not be assigned as new fd`s when you request for a new socket

Resources