pyzmq Segmentation fault on large .recv() - pyzmq

edited - I think I wrongly pointed the blame at pyzmq - I actually was running in a QThread and upon suggestion worked on a full working example - not using QThread and I do not see any issues. I'm going to drop my working example for anyone who may want to reference this example.
I'm attempting to publish data on a pyzmq socket. I have experience here but never for larger data. On occasion, I'm getting a seg fault or at other times a message about free(). This happens more often for larger data sets, up to a point where it never actually works (depending on size).
subscriber code:
import zmq
from time import sleep
from threading import Thread
class ZmqListener(Thread):
def __init__(self, addr, port):
Thread.__init__(self)
self.addr = addr
self.port = port
def run(self):
socket = zmq.Context().socket(zmq.SUB)
socket.connect("tcp://%s:%s" % (self.addr, self.port))
socket.setsockopt(zmq.SUBSCRIBE, "test:".encode('utf-8'))
poller = zmq.Poller()
poller.register(socket, zmq.POLLIN)
self.running = True
while self.running:
print(socket.recv())
s = dict(poller.poll(1000))
if s:
if s.get(socket) == zmq.POLLIN:
msg = socket.recv()
print(msg)
socket.close()
th = ZmqListener('localhost', IMG_BASE_PORT)
th.start()
while True:
try:
sleep(1)
except KeyboardInterrupt:
break
th.running = False
th.join()
QThread subscriber code that breaks at random:
class ZmqListener(QtCore.QThread):
message = QtCore.Signal(object)
def __init__(self, addr, port, parent=None):
QtCore.QThread.__init__(self, parent)
self.addr = addr
self.port = port
def run(self):
socket = zmq.Context().socket(zmq.SUB)
socket.connect("tcp://%s:%s" % (self.addr, self.port))
socket.setsockopt(zmq.SUBSCRIBE, "{}:".format(IMG_SUBS_HEAD).encode('utf-8'))
poller = zmq.Poller()
poller.register(socket, zmq.POLLIN)
self.running = True
while self.running:
s = dict(poller.poll(1000))
if s:
if s.get(socket) == zmq.POLLIN:
msg = socket.recv()
self.message.emit(msg[len(IMG_SUBS_HEAD)+1::])
socket.close()
publisher code:
#include <zmq.hpp>
#include <string>
#include <sstream>
#include <cstdlib>
#include <ios>
#include <thread>
#include <chrono>
#define PORT 13600
#define PUBS "test:"
#define NBYT 256 * 1024 * 1024 * 1024
int main()
{
zmq::context_t context(1);
zmq::socket_t socket(context, ZMQ_PUB);
std::string rep_head;
char bytes[NBYT];
std::string zaddr("tcp://*:");
zaddr += std::to_string(PORT);
socket.bind(zaddr.c_str());
rep_head = std::string(PUBS);
int idx = 0;
while (1) {
for (int i = 0; i < NBYT; i++)
bytes[i] = rand(); // sleazy
std::stringstream ss;
ss << rep_head << std::hex << (int)idx << ":";
zmq::message_t msg(ss.str().length() + NBYT);
memcpy((char*)msg.data(), ss.str().data(), ss.str().length());
memcpy(((char*)msg.data())+ss.str().length(), bytes, NBYT);
socket.send(msg);
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
socket.close();
return 0;
}
I've recently updated my OS (Ubuntu 18.04) and this was not a problem in the past.

Step 1: Never violate API rules. Never worth trying to do that
PUB side ought follow the ZeroMQ API documentation: never access messages data "manually", always use { zmq_msg_init() | zmq_msg_init_data() | zmq_msg_init_size() }-mutually-exclusive member-functions, without ever touching the fragile eggs directly ( content referenced by the pointers )
void myFuncToFREE ( void *data2FREE, void *hint ) // UTILITY FUN FOR FREE-ING MEMORY
{ // STRAIGHT ON .SEND()
free ( data2FREE );
}
...
zmq_msg_t message;
int message_len = ss.str().length() + NBYT;
// int zmq_msg_init_data ( zmq_msg_t *msg, void *data, size_t size, zmq_free_fun *ffn, void *hint );
rc = zmq_msg_init_data ( &message, // -------------- LOAD AT ONCE:
< here put concatenated both ss.str().data() + bytes >,
message_len,
myFuncToFREE,
NULL
);
assert ( rc == 0 && "INF: FAILED call to zmq_msg_init_data() ... " );
...
rc = zmq_msg_send( &message, socket, 0 );
assert ( rc == message_len && "INF: FAILED call to zmq_msg_send() ... " );
...
rc = zmq_msg_close( &message ); // Note that this is NOT necessary after a successful zmq_msg_send()
assert ( rc == 0 && "INF: FAILED call to zmq_msg_close() ... " );

Related

why does pthread_cond_timedwait not trigger after indicated time-limit?

This is supposed to work in a loop (server) and delegate work/inquiry to a faulty library, here represented by the longrun() function call, to a thread with a time-out of tmax=3s. I placed synchronization vars and i am trying to wait for no more than this limit, but when longrun() hangs (run 4), it still waits the full time (7s) instead of the requested limit. Can anyone explain?
#include <unistd.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <pthread.h>
#include <sys/time.h>
#include <iostream>
using namespace std;
string int2str(int i){
char buf[10]; // no larger int passed we hope
int end = sprintf(buf, "%d", i);
buf[end] = '\0';
return string(buf);
}
string longrun(int qi){
if(qi % 4 == 0) {
sleep(7);
return string("'---- to: ") + int2str(qi) + string("' (hang case)");
}
else {
sleep(1);
return string("'okay to: ") + int2str(qi) + string("'");
}
}
struct tpack_t { // thread pack
pthread_t thread;
pthread_mutex_t mutex;
pthread_cond_t go; // have a new value to run
pthread_cond_t ready; // tell main thread we're done processing
int newq; // predicate on go+ready condition for wait
int qi; // place question as int to thread: question-int
string res; // where i place the response
tpack_t();
};
tpack_t::tpack_t() {
pthread_mutex_init (&mutex, NULL);
pthread_cond_init (&go, NULL);
pthread_cond_init (&ready, NULL);
newq = 0;
}
void set_cond_time(timespec* ctp, int tmax){
timeval now;
gettimeofday(&now, NULL);
ctp->tv_nsec = now.tv_usec * 1000UL;
ctp->tv_sec = now.tv_sec + tmax; // now + max time!
printf("[m] ... set to sleep for %d sec, i hope...\n", tmax);
}
void take_faulty_evasive_action(tpack_t* tpx){
// basically kill thread, clean faulty library copy (that file) and restart it
cout << "will work on it (restarting thread) soon!\n";
tpx->newq = 0; // minimal action for now...
}
void* faulty_proc(void* arg){
tpack_t* tpx = (tpack_t*) arg;
while(true){
pthread_mutex_lock(&tpx->mutex);
while(tpx->newq == 0){
pthread_cond_wait(&tpx->go, &tpx->mutex);
}
printf("[t] to process : %d\n", tpx->qi); fflush(stdout);
// now i have a new value in qi, process it and place the answer in... res
tpx->res = longrun(tpx->qi);
tpx->newq = 0;
pthread_mutex_unlock(&tpx->mutex);
pthread_cond_signal(&tpx->ready);
}
}
int main(int argc, char* argv[]){
cout << "\n this presents the problem: idx = 4k -> hang case ...\n ( challenge is to eliminate them by killing thread and restarting it )\n\n";
printf(" ETIMEDOUT = %d EINVAL = %d EPERM = %d\n\n", ETIMEDOUT, EINVAL, EPERM);
tpack_t* tpx = new tpack_t();
pthread_create(&tpx->thread, NULL, &faulty_proc, (void*) tpx);
// max wait time; more than that is a hanging indication!
int numproc = 5;
++numproc;
int tmax = 3;
timespec cond_time;
cond_time.tv_nsec = 0;
int status, expired; // for timed wait on done condition!
time_t t0 = time(NULL);
for(int i=1; i<numproc; ++i){
expired = 0;
pthread_mutex_lock(&tpx->mutex);
tpx->qi = i; // init the question
tpx->newq = 1; // ... predicate
//pthread_mutex_unlock(&tpx->mutex);
pthread_cond_signal(&tpx->go); // let it know that...
while(tpx->newq == 1){
/// ---------------------- most amazing region, timedwait waits all the way! ----------------------
set_cond_time(&cond_time, tmax); // time must be FROM NOW! (abs time, not interval)
time_t wt0 = time(NULL);
status = pthread_cond_timedwait(&tpx->ready, &tpx->mutex, &cond_time);
printf("[m] ---- \t exited with status = %d (after %.2fs)\n", status, difftime(time(NULL), wt0));
/// -----------------------------------------------------------------------------------------------
if (status == ETIMEDOUT){
printf("\t ['t was and newq == %d]\n", tpx->newq);
if(tpx->newq == 1){ // check one more time, to elim race possibility
expired = 1;
break;
}
}
else if(status != 0){
fprintf(stderr, "cond timewait for faulty to reply errored out\n");
return 1;
}
}
if(expired){
take_faulty_evasive_action(tpx); // kill thread, start new one, report failure below
cout << "[m] :: interruption: default bad answer goes here for " << i << "\n\n";
}
else {
cout << "[m] :: end with ans: " << tpx->res << endl << endl;
}
pthread_mutex_unlock(&tpx->mutex);
}
time_t t1 = time(NULL);
printf("took %.2f sec to run\n", difftime(t1, t0));
}
Used 'g++ -pthread code.cc' to compile under linux (ubuntu 16.04). Output is:
this presents the problem: idx = 4k -> hang case ...
( challenge is to eliminate them by killing thread and restarting it )
ETIMEDOUT = 110 EINVAL = 22 EPERM = 1
[m] ... set to sleep for 3 sec, i hope...
[t] to process : 1
[m] ---- exited with status = 0 (after 1.00s)
[m] :: end with ans: 'okay to: 1'
[m] ... set to sleep for 3 sec, i hope...
[t] to process : 2
[m] ---- exited with status = 0 (after 1.00s)
[m] :: end with ans: 'okay to: 2'
[m] ... set to sleep for 3 sec, i hope...
[t] to process : 3
[m] ---- exited with status = 0 (after 1.00s)
[m] :: end with ans: 'okay to: 3'
[m] ... set to sleep for 3 sec, i hope...
[t] to process : 4
[m] ---- exited with status = 110 (after 7.00s)
['t was and newq == 0]
[m] :: end with ans: '---- to: 4' (hang case)
[m] ... set to sleep for 3 sec, i hope...
[t] to process : 5
[m] ---- exited with status = 0 (after 1.00s)
[m] :: end with ans: 'okay to: 5'
took 11.00 sec to run
The problem is that faulty_proc() keeps tpx->mutex locked while it calls longrun(), and the pthread_cond_timedwait() call in main() can't return until it can re-acquire the mutex, even if the timeout expires.
If longrun() doesn't need the mutex to be locked - and that seems to be the case - you can unlock the mutex around that call and re-lock it before setting the completion flag and signalling the condition variable.

Why is my timer not periodic but expired only one time?

I have created a timer using POSIX timerfd function.
Intention is, the timer should be periodic, and the timer expiry is observed from a seperate function called myFunc( )
I am calling this function multiple times, so that the timer expiry can be observed periodically after a wait of 5 secs.
Problem is, as soon as first time it expires after 5 seconds, next time onwards...it doesn't expire again, that is no delay of 5 seconds is observed from the second iteration onwards.
Can someone tell me what i am missing?
#include <stdio.h>
#include <iostream>
#include <errno.h>
#include <dlfcn.h>
#include <assert.h>
#include <sys/mman.h>
#include <new>
#include <limits.h>
#include <sys/epoll.h>
#include <sys/timerfd.h>
using namespace std;
struct epoll_event event;
int timer_fd, efd, no_of_fd;
void myFunc( int i );
int main()
{
struct itimerspec its;
its.it_value.tv_sec = 5;
its.it_value.tv_nsec = 0;
its.it_interval.tv_sec = 3; // Every 3 seconds interval
its.it_interval.tv_nsec = 0;
efd = epoll_create(2);
timer_fd = timerfd_create(CLOCK_REALTIME, TFD_NONBLOCK);
if ( timer_fd == -1 )
{
fprintf(stderr, "timerfd_create error in start timer");
return 1;
}
event.data.fd = timer_fd;
event.events = EPOLLIN|EPOLLPRI;
if ( epoll_ctl(efd, EPOLL_CTL_ADD, timer_fd, &event) == -1 )
{
fprintf(stderr, "epoll_ctl error in start timer");
return 1;
}
if ( timerfd_settime(timer_fd, 0, &its, NULL) == -1 )
{
fprintf(stderr, "timerfd_settime error in start timer");
return 1;
}
myFunc( 10 );
myFunc( 20 );
myFunc( 30 );
}
void myFunc( int i )
{
printf("Inside myFunc %d\n", i);
no_of_fd = 0;
struct epoll_event revent;
errno = 0;
do {
no_of_fd = epoll_wait(efd, &revent, 1, -1);
} while ( no_of_fd < 0 && errno == EINTR );
if ( no_of_fd < 0 )
{
fprintf(stderr, "epoll_wait error in start timer");
}
if ( revent.data.fd == timer_fd ) {
printf("Timer expired \n");
}
}
When using epoll with level-triggering, you should read 8 bytes on every EPOLLIN. This is an int64 that tells you the number of event expirations. Reading it effectively "clears" this number so that the next EPOLLIN is the result of a different event expiration.
The manual tells you about reading:
If the timer has already expired one or more times since its
settings were last modified using timerfd_settime(), or since
the last successful read(2), then the buffer given to read(2)
returns an unsigned 8-byte integer (uint64_t) containing the
number of expirations that have occurred. (The returned value
is in host byte order—that is, the native byte order for
integers on the host machine.)

which signal should I use to come out of accept() API?

I have two threads one is blocked for a new connection in accept(), and another one talks other processes. When My application is going to shutdown, I needs to wake up the first thread from the accept(). I have tried to read the man page of accept() but did not find some thing use full. My question is which signal I should send from the second thread to the first thread so that It will come out of accept and also it won't get killed??
Thanks.
you can use a select with a timeout, so for example your thread executing accept wakes up every 1 or 2 seconds if nothing occurs and checks for shutdown. You can check this page to have an idea.
Without using "select"
Example code worked very well on Windows. It displayed "Exit" when SIGINT raised. You can edit code as suitable for Linux. Almost every socket function is identical except you should use "close" instead of "closesocket" and you should delete first 2 lines of code it is about starting winsock and add necessary header files for Linux.
#include <stdio.h>
#include <winsock.h>
#include <signal.h>
#include <thread>
#pragma comment(lib,"wsock32.lib")
jmp_buf EXIT_POINT;
int sock,sockl=sizeof(struct sockaddr);
struct sockaddr_in xx,client;
int AcceptConnections = 1;
void _catchsignal(int signal)
{
closesocket(sock);
}
void thread_accept()
{
accept(sock,(struct sockaddr*)&client,&sockl);
}
void thread_sleep()
{
Sleep(1000);
raise(SIGINT);
}
int _tmain(int argc, _TCHAR* argv[])
{
WSADATA wsaData;
WSAStartup(MAKEWORD( 2, 2 ),&wsaData);
signal(SIGINT,_catchsignal);
xx.sin_addr.s_addr = INADDR_ANY;
xx.sin_family = AF_INET;
xx.sin_port = htons(9090);
sock = socket(AF_INET,SOCK_STREAM,0);
bind(sock,(struct sockaddr*)&xx,sizeof(struct sockaddr));
listen(sock,20);
std::thread th_accept(thread_accept);
std::thread th_sleep(thread_sleep);
th_accept.join();
th_sleep.join();
printf("Exit");
return 0;
}
First you can use "select" function for accept functions without blocking thread. You can learn more about select in msdn and beej my recommendation is look at last one and you can use MSDN resources on socket programming because Windows and most of operating systems work on BSD Sockets which is almost identical. After accept connections without blocking them you can just define a global variable which can stop loop.
Sorry for my English, and here is a example code:
#include <stdio.h>
#include <stdlib.h>
#include <winsock.h>
#define DEFAULT_PORT 9090
#define QUEUE_LIMIT 20
int main()
{
WSADATA wsaData;
WSAStartup(MAKEWORD( 2, 2 ),&wsaData);
int ServerStream,SocketQueueMax=0,i,j,TMP_ClientStream;
int ClientAddrSize = sizeof(struct sockaddr),RecvBufferLength;
fd_set SocketQueue,SocketReadQueue,SocketWriteQueue;
struct sockaddr_in ServerAddr,TMP_ClientAddr;
struct timeval SocketTimeout;
char RecvBuffer[255];
const char *HelloMsg = "Connected.";
SocketTimeout.tv_sec = 1;
ServerAddr.sin_addr.s_addr = INADDR_ANY;
ServerAddr.sin_family = AF_INET;
ServerAddr.sin_port = htons(DEFAULT_PORT);
ServerStream = socket(AF_INET,SOCK_STREAM,0);
bind(ServerStream,(struct sockaddr*)&ServerAddr,sizeof(struct sockaddr));
listen(ServerStream,QUEUE_LIMIT);
FD_ZERO(&SocketQueue);
FD_ZERO(&SocketReadQueue);
FD_ZERO(&SocketWriteQueue);
FD_SET(ServerStream,&SocketQueue);
SocketQueueMax = ServerStream;
bool AcceptConnections = 1;
while(AcceptConnections)
{
SocketReadQueue = SocketQueue;
SocketWriteQueue = SocketQueue;
select(SocketQueueMax + 1,&SocketReadQueue,&SocketWriteQueue,NULL,&SocketTimeout);
for(i=0;i < SocketQueueMax + 1;i++)
{
if(FD_ISSET(i,&SocketReadQueue))
{
if(i == ServerStream)
{
TMP_ClientStream = accept(ServerStream,(struct sockaddr*)&TMP_ClientAddr,&ClientAddrSize);
send(TMP_ClientStream,HelloMsg,strlen(HelloMsg),0);
FD_SET(TMP_ClientStream,&SocketQueue);
if(TMP_ClientStream > SocketQueueMax)
{
SocketQueueMax = TMP_ClientStream;
}
continue;
}
while((RecvBufferLength = recv(i,RecvBuffer,254,0)) > 0)
{
RecvBuffer[RecvBufferLength] = '\0';
for(j=0;j<SocketQueueMax + 1;j++)
{
if(j == i || j == ServerStream || !FD_ISSET(j,&SocketQueue))
{
continue;
}
send(j,RecvBuffer,RecvBufferLength + 1,0);
}
printf("%s",RecvBuffer);
if(RecvBufferLength < 254)
{
break;
}
}
}
}
}
return EXIT_SUCCESS;
}

Linux select() and FIFO ordering of multiple sockets?

Is there any way for the Linux select() call relay event ordering?
A description of what I'm seeing:
On one machine, I wrote a simple program which sends three multicast packets, one to each of three different multicast groups. These packets are sent back-to-back, with no delay in between. I.e. sendto(mcast_group1); sendto(mcast_group2); sendto(mcast_group3).
On the other machine, I have a receiving program. The program uses one socket per multicast group. Each socket does a bind() and IP_ADD_MEMBERSHIP (i.e. join/subscribe) to the address to which it listens. The program then does a select() on the three sockets.
When select returns, all three sockets are available for reading. But which one came first? The ready-for-reading list of sockets is a set, and therefore has no order. What I would like is if select() returned exactly once per received packet, in order (the increased overhead is acceptable here). Or, is there some other kind of mechanism I can use to determine packet receive order?
Additional information:
OS is CentOS 5 (effectively Redhat Enterprise Linux) on x86_64
NIC hardware is an Intel 82571EB
I've tried e1000e driver versions 1.3.10-k2 and 2.1.4-NAPI
I've tried pinning the NIC's interrupt to an unloaded and isolated CPU core
I've disabled hardware IRQ coalescing via setting the driver option InterruptThrottleRate=0, and setting rx-usecs=0 via ethtool
I also tried using epoll, and it has the same behavior
A final remark: packet ordering is preserved if I only use one socket. In this case, I bind to INADDR_ANY (0.0.0.0) and do the IP_ADD_MEMBERSHIP multiple times on the same socket. But this does not work for our application, because we need the filtering provided by binding to the actual multicast address. Ultimately, there will be multiple multicast receiving programs on the same machine, with subscription sets that may intersect with each other. So maybe an alternate solution is to find another way to achieve the filtering effect of bind(), but without bind().
You can use IP_PKTINFO to get the address of the multicast group the packet was send to - even if the socket is subscribed for a bunch of multicast groups. Having this in place, you will get the packets in order and the ability to filter by group addresses. See the example below:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <sys/stat.h>
#include <ctype.h>
#include <errno.h>
#define PORT 1234
#define PPANIC(msg) perror(msg); exit(1);
#define STATS_PATCH 0
int main(int argc, char **argv)
{
fd_set master;
fd_set read_fds;
struct sockaddr_in serveraddr;
int sock;
int opt = 1;
size_t i;
int rc;
char *mcast_groups[] = {
"226.0.0.1",
"226.0.0.2",
NULL
};
#if STATS_PATCH
struct stat stat_buf;
#endif
struct ip_mreq imreq;
FD_ZERO(&master);
FD_ZERO(&read_fds);
rc = sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
if(rc == -1)
{
PPANIC("socket() failed");
}
rc = setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
if(rc == -1)
{
PPANIC("setsockopt(reuse) failed");
}
memset(&serveraddr, 0, sizeof(serveraddr));
serveraddr.sin_family = AF_INET;
serveraddr.sin_port = htons(PORT);
serveraddr.sin_addr.s_addr = htonl(INADDR_ANY);
rc = bind(sock, (struct sockaddr *)&serveraddr, sizeof(serveraddr));
if(rc == -1)
{
PPANIC("bind() failed");
}
rc = setsockopt(sock, IPPROTO_IP, IP_PKTINFO, &opt, sizeof(opt));
if(rc == -1)
{
PPANIC("setsockopt(IP_PKTINFO) failed");
}
for (i = 0; mcast_groups[i] != NULL; i++)
{
imreq.imr_multiaddr.s_addr = inet_addr(mcast_groups[i]);
imreq.imr_interface.s_addr = INADDR_ANY;
rc = setsockopt(sock, IPPROTO_IP, IP_ADD_MEMBERSHIP, (const void *)&imreq, sizeof(struct ip_mreq));
if (rc != 0)
{
PPANIC("joing mcast group failed");
}
}
FD_SET(sock, &master);
while(1)
{
read_fds = master;
rc = select(sock + 1, &read_fds, NULL, NULL, NULL);
if (rc == 0)
{
continue;
}
if(rc == -1)
{
PPANIC("select() failed");
}
if(FD_ISSET(sock, &read_fds))
{
char buf[1024];
int inb;
char ctrl_msg_buf[1024];
struct iovec iov[1];
iov[0].iov_base = buf;
iov[0].iov_len = 1024;
struct msghdr msg_hdr = {
.msg_iov = iov,
.msg_iovlen = 1,
.msg_name = NULL,
.msg_namelen = 0,
.msg_control = ctrl_msg_buf,
.msg_controllen = sizeof(ctrl_msg_buf),
};
struct cmsghdr *ctrl_msg_hdr;
inb = recvmsg(sock, &msg_hdr, 0);
if (inb < 0)
{
PPANIC("recvmsg() failed");
}
for (ctrl_msg_hdr = CMSG_FIRSTHDR(&msg_hdr); ctrl_msg_hdr != NULL; ctrl_msg_hdr = CMSG_NXTHDR(&msg_hdr, ctrl_msg_hdr))
{
if (ctrl_msg_hdr->cmsg_level == IPPROTO_IP && ctrl_msg_hdr->cmsg_type == IP_PKTINFO)
{
struct in_pktinfo *pckt_info = (struct in_pktinfo *)CMSG_DATA(ctrl_msg_hdr);
printf("got data for mcast group: %s\n", inet_ntoa(pckt_info->ipi_addr));
break;
}
}
printf("|");
for (i = 0; i < inb; i++)
printf("%c", isprint(buf[i])?buf[i]:'?');
printf("|\n");
#if STATS_PATCH
rc = fstat(sock, &stat_buf);
if (rc == -1)
{
perror("fstat() failed");
} else {
printf("st_atime: %d\n", stat_buf.st_atime);
printf("st_mtime: %d\n", stat_buf.st_mtime);
printf("st_ctime: %d\n", stat_buf.st_ctime);
}
#endif
}
}
return 0;
}
the code below won't solve OPs problem but may guide people dealing with similar requirements
(EDIT) One should not do such things late at night... even with that solution you will only get the order the fd was handled by select - and this will give you no indication about the time of the frame arrival.
As stated here, it is currently not possible to retrieve the order of the sockets or the timestamps they changed as the required callback is not set for socket inodes. But if you are able to patch your kernel, you may work around the problem by setting the time within the select system call.
The following patch may give you an idea:
diff --git a/fs/select.c b/fs/select.c
index 467bb1c..3f2927e 100644
--- a/fs/select.c
+++ b/fs/select.c
## -435,6 +435,9 ## int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
for (i = 0; i < n; ++rinp, ++routp, ++rexp) {
unsigned long in, out, ex, all_bits, bit = 1, mask, j;
unsigned long res_in = 0, res_out = 0, res_ex = 0;
+ struct timeval tv;
+
+ do_gettimeofday(&tv);
in = *inp++; out = *outp++; ex = *exp++;
all_bits = in | out | ex;
## -452,6 +455,16 ## int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
f = fdget(i);
if (f.file) {
const struct file_operations *f_op;
+ struct kstat stat;
+
+ int ret;
+ u8 is_sock = 0;
+
+ ret = vfs_getattr(&f.file->f_path, &stat);
+ if(ret == 0 && S_ISSOCK(stat.mode)) {
+ is_sock = 1;
+ }
+
f_op = f.file->f_op;
mask = DEFAULT_POLLMASK;
if (f_op->poll) {
## -464,16 +477,22 ## int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
res_in |= bit;
retval++;
wait->_qproc = NULL;
+ if(is_sock && f.file->f_inode)
+ f.file->f_inode->i_ctime.tv_sec = tv.tv_sec;
}
if ((mask & POLLOUT_SET) && (out & bit)) {
res_out |= bit;
retval++;
wait->_qproc = NULL;
+ if(is_sock && f.file->f_inode)
+ f.file->f_inode->i_ctime.tv_sec = tv.tv_sec;
}
if ((mask & POLLEX_SET) && (ex & bit)) {
res_ex |= bit;
retval++;
wait->_qproc = NULL;
+ if(is_sock && f.file->f_inode)
+ f.file->f_inode->i_ctime.tv_sec = tv.tv_sec;
}
/* got something, stop busy polling */
if (retval) {
Notes:
this is... just for you :) - don't expect it in the mainline
do_gettimeofday() is called before each relevant fd is tested.
to get higher granularity this should be done in each iteration (and only if needed). since the stat-interface only offers a granularity of one second
you may (!UGLY!) use the remaining time attributes to map the fractions of a second to those fields.
this was done using kernel 3.16.0 and is not well tested. don't use it in a space ship or medical equipment. if you would like to try it, get a filesystem-image (eg. https://people.debian.org/~aurel32/qemu/amd64/debian_wheezy_amd64_standard.qcow2) and use qemu to test it:
sudo qemu-system-x86_64 -kernel arch/x86/boot/bzImage -hda debian_wheezy_amd64_standard.qcow2 -append "root=/dev/sda1"
If select() returns > 1 the events must have been so close together as to make the question of ordering meaningless.
You can obtain the timestamp at which a file descriptor became ready using fstat.
For more info read http://pubs.opengroup.org/onlinepubs/009695399/functions/fstat.html

Simultaneous socket read/write ("full-duplex") in Linux (aio specifically)

I'm porting an application built on top of the ACE Proactor framework. The application runs perfectly for both VxWorks and Windows, but fails to do so on Linux (CentOS 5.5, WindRiver Linux 1.4 & 3.0) with kernel 2.6.X.X - using librt.
I've narrowed the problem down to a very basic issue:
The application begins an asynchronous (via aio_read) read operation on a socket and subsequently begins an asynchronous (via aio_write) write on the very same socket. The read operation cannot be fulfilled yet since the protocol is initialized from the application's end.
- When the socket is in blocking-mode, the write is never reached and the protocol "hangs".
- When using a O_NONBLOCK socket, the write succeeds but the read returns indefinitely with a "EWOULDBLOCK/EAGAIN" error, never to recover (even if the AIO operation is restarted).
I went through multiple forums and could not find a definitive answer to whether this should work (and I'm doing something wrong) or impossible with Linux AIO. Is it possible if I drop the AIO and seek a different implementation (via epoll/poll/select etc.)?
Attached is a sample code to quickly re-produce the problem on a non-blocking socket:
#include <aio.h>
#include <stdio.h>
#include <stdlib.h>
#include <netdb.h>
#include <string.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <assert.h>
#include <errno.h>
#define BUFSIZE (100)
// Global variables
struct aiocb *cblist[2];
int theSocket;
void InitializeAiocbData(struct aiocb* pAiocb, char* pBuffer)
{
bzero( (char *)pAiocb, sizeof(struct aiocb) );
pAiocb->aio_fildes = theSocket;
pAiocb->aio_nbytes = BUFSIZE;
pAiocb->aio_offset = 0;
pAiocb->aio_buf = pBuffer;
}
void IssueReadOperation(struct aiocb* pAiocb, char* pBuffer)
{
InitializeAiocbData(pAiocb, pBuffer);
int ret = aio_read( pAiocb );
assert (ret >= 0);
}
void IssueWriteOperation(struct aiocb* pAiocb, char* pBuffer)
{
InitializeAiocbData(pAiocb, pBuffer);
int ret = aio_write( pAiocb );
assert (ret >= 0);
}
int main()
{
int ret;
int nPort = 11111;
char* szServer = "10.10.9.123";
// Connect to the remote server
theSocket = socket(AF_INET, SOCK_STREAM, 0);
assert (theSocket >= 0);
struct hostent *pServer;
struct sockaddr_in serv_addr;
pServer = gethostbyname(szServer);
bzero((char *) &serv_addr, sizeof(serv_addr));
serv_addr.sin_family = AF_INET;
serv_addr.sin_port = htons(nPort);
bcopy((char *)pServer->h_addr, (char *)&serv_addr.sin_addr.s_addr, pServer->h_length);
assert (connect(theSocket, (const sockaddr*)(&serv_addr), sizeof(serv_addr)) >= 0);
// Set the socket to be non-blocking
int oldFlags = fcntl(theSocket, F_GETFL) ;
int newFlags = oldFlags | O_NONBLOCK;
fcntl(theSocket, F_SETFL, newFlags);
printf("Socket flags: before=%o, after=%o\n", oldFlags, newFlags);
// Construct the AIO callbacks array
struct aiocb my_aiocb1, my_aiocb2;
char* pBuffer = new char[BUFSIZE+1];
bzero( (char *)cblist, sizeof(cblist) );
cblist[0] = &my_aiocb1;
cblist[1] = &my_aiocb2;
// Start the read and write operations on the same socket
IssueReadOperation(&my_aiocb1, pBuffer);
IssueWriteOperation(&my_aiocb2, pBuffer);
// Wait for I/O completion on both operations
int nRound = 1;
printf("\naio_suspend round #%d:\n", nRound++);
ret = aio_suspend( cblist, 2, NULL );
assert (ret == 0);
// Check the error status for the read and write operations
ret = aio_error(&my_aiocb1);
assert (ret == EWOULDBLOCK);
// Get the return code for the read
{
ssize_t retcode = aio_return(&my_aiocb1);
printf("First read operation results: aio_error=%d, aio_return=%d - That's the first EWOULDBLOCK\n", ret, retcode);
}
ret = aio_error(&my_aiocb2);
assert (ret == EINPROGRESS);
printf("Write operation is still \"in progress\"\n");
// Re-issue the read operation
IssueReadOperation(&my_aiocb1, pBuffer);
// Wait for I/O completion on both operations
printf("\naio_suspend round #%d:\n", nRound++);
ret = aio_suspend( cblist, 2, NULL );
assert (ret == 0);
// Check the error status for the read and write operations for the second time
ret = aio_error(&my_aiocb1);
assert (ret == EINPROGRESS);
printf("Second read operation request is suddenly marked as \"in progress\"\n");
ret = aio_error(&my_aiocb2);
assert (ret == 0);
// Get the return code for the write
{
ssize_t retcode = aio_return(&my_aiocb2);
printf("Write operation has completed with results: aio_error=%d, aio_return=%d\n", ret, retcode);
}
// Now try waiting for the read operation to complete - it'll just busy-wait, receiving "EWOULDBLOCK" indefinitely
do
{
printf("\naio_suspend round #%d:\n", nRound++);
ret = aio_suspend( cblist, 1, NULL );
assert (ret == 0);
// Check the error of the read operation and re-issue if needed
ret = aio_error(&my_aiocb1);
if (ret == EWOULDBLOCK)
{
IssueReadOperation(&my_aiocb1, pBuffer);
printf("EWOULDBLOCK again on the read operation!\n");
}
}
while (ret == EWOULDBLOCK);
}
Thanks in advance,
Yotam.
Firstly, O_NONBLOCK and AIO don't mix. AIO will report the asynchronous operation complete when the corresponding read or write wouldn't have blocked - and with O_NONBLOCK, they would never block, so the aio request will always complete immediately (with aio_return() giving EWOULDBLOCK).
Secondly, don't use the same buffer for two simultaneous outstanding aio requests. The buffer should be considered completely offlimits between the time when the aio request was issued and when aio_error() tells you that it has completed.
Thirdly, AIO requests to the same file descriptor are queued, in order to give sensible results. This means that your write won't happen until the read completes - if you need to write the data first, you need to issue the AIOs in the opposite order. The following will work fine, without setting O_NONBLOCK:
struct aiocb my_aiocb1, my_aiocb2;
char pBuffer1[BUFSIZE+1], pBuffer2[BUFSIZE+1] = "Some test message";
const struct aiocb *cblist[2] = { &my_aiocb1, &my_aiocb2 };
// Start the read and write operations on the same socket
IssueWriteOperation(&my_aiocb2, pBuffer2);
IssueReadOperation(&my_aiocb1, pBuffer1);
// Wait for I/O completion on both operations
int nRound = 1;
int aio_status1, aio_status2;
do {
printf("\naio_suspend round #%d:\n", nRound++);
ret = aio_suspend( cblist, 2, NULL );
assert (ret == 0);
// Check the error status for the read and write operations
aio_status1 = aio_error(&my_aiocb1);
if (aio_status1 == EINPROGRESS)
puts("aio1 still in progress.");
else
puts("aio1 completed.");
aio_status2 = aio_error(&my_aiocb2);
if (aio_status2 == EINPROGRESS)
puts("aio2 still in progress.");
else
puts("aio2 completed.");
} while (aio_status1 == EINPROGRESS || aio_status2 == EINPROGRESS);
// Get the return code for the read
ssize_t retcode;
retcode = aio_return(&my_aiocb1);
printf("First operation results: aio_error=%d, aio_return=%d\n", aio_status1, retcode);
retcode = aio_return(&my_aiocb1);
printf("Second operation results: aio_error=%d, aio_return=%d\n", aio_status1, retcode);
Alternatively, if you don't care about reads and writes being ordered with respect to each other, you can use dup() to create two file descriptors for the socket, and use one for reading and the other for writing - each will have its AIO operations queued separately.

Resources