Embedded Linux poll() returns constantly - linux

I have a particular problem. Poll keeps returning when I know there is nothing to read.
So the setup it as follows, I have 2 File Descriptors which form part of a fd set that poll watches. One is for a Pin high to low change (GPIO). The other is for a proxy input. The problem occurs with the Proxy Input.
The order of processing is: start main functions; it will then poll; write data to proxy; poll will break; accept the data; send the data over SPI; receiving slave device, signals that it wants to send ack, by Dropping GPIO low; poll() senses this drop and reacts;
Infinite POLLINs :(
IF I have no timeout on the Poll function, the program works perfectly. The moment I include a timeout on the Poll. The Poll returns continuously. Not sure what I am doing wrong here.
while(1)
{
memset((void*)fdset, 0, sizeof(fdset));
fdset[0].fd = gpio_fd;
fdset[0].events = POLLPRI; // POLLPRI - There is urgent data to read
fdset[1].fd = proxy_rx;
fdset[1].events = POLLIN; // POLLIN - There is data to read
rc = poll(fdset, nfds, 1000);//POLL_TIMEOUT);
if (rc < 0) // Error
{
printf("\npoll() failed/Interrupted!\n");
}
else if (rc == 0) // Timeout occurred
{
printf(" poll() timeout\n");
}
else
{
if (fdset[1].revents & POLLIN)
{
printf("fdset[1].revents & POLLIN\n");
if ((resultR =read(fdset[1].fd,command_buf,10))<0)
{
printf("Failed to read Data\n");
}
if (fdset[0].revents & POLLPRI)
//if( (gpio_fd != -1) && (FD_ISSET(gpio_fd, &err)))
{
lseek(fdset[0].fd, 0, SEEK_SET); // Read from the start of the file
len = read(fdset[0].fd, reader, 64);
}
}
}
}
So that is the gist of my code.
I have also used GDB and while debugging, I found that the GPIO descriptor was set with revents = 0x10, which means that an error occurred and that POLLPRI also occurred.
In this question, something similar was addressed. But I do read all the time whenever I get POLLIN. It is a bit amazing, that this problem only occurs when I include the timeout, if I replace the poll timeout with -1, it works perfectly.

When poll fails (returning -1) you should do something with errno, perhaps thru perror; and your nfds (the second argument to poll) is not set, but it should be the constant 2.
Probably the GCC compiler would have given a warning, at least with all warnings enabled (-Wall), about nfds not being set.
(I'm guessing that nfds being uninitialized might be some "random" large value.... So the kernel might be polling other "random" file descriptors, those in your fdset after index 2...)
BTW, you could strace your program. And using the fdset name is a bit confusing (it could refer to select(2)).

Assuming I fixed your formatting properly in your question, it looks like you have a missing } after the POLLIN block and the next if() that checks the POLLPRI. It would possibly work better this way:
if (fdset[1].revents & POLLIN)
{
printf("fdset[1].revents & POLLIN\n");
if ((resultR =read(fdset[1].fd,command_buf,10))<0)
{
printf("Failed to read Data\n");
}
}
if (fdset[0].revents & POLLPRI)
//if( (gpio_fd != -1) && (FD_ISSET(gpio_fd, &err)))
{
lseek(fdset[0].fd, 0, SEEK_SET); // Read from the start of the file
len = read(fdset[0].fd, reader, 64);
}
Although you can do whatever you want with indentation in C/C++/Java/JavaScript, not doing it right can bite you really hard. Hopefully, I'm wrong and your original code was correct.
Another one I often see: People not using the { ... } at all and end up writing code like:
if(expr) do_a; do_b;
and of course, do_b; will be executed all the time, whether expr is true or false... and although you could fix the above with a comma like so:
if(expr) do_a, do_b;
the only safe way to do it right is to use the brackets:
if(expr)
{
do_a;
do_b;
}
Always make sure your indentation is perfect and write small functions so you can see that it is indeed perfect.

Related

non-blocking socket vs. select() driven approach

I'm writing user-space application which among other functionality uses netlink sockets to talk to the kernel. I use simple API provided by open source library libmnl.
My application sets certain options over netlink as well as it subscribes to netlink events (notifications), parses it etc. So this second feature (event notifications) is asynchronous, currently I implemented a simple select() based loop:
...
fd_set rfd;
struct timeval tv;
int ret;
while (1) {
tv.tv_sec = 1;
tv.tv_usec = 0;
FD_ZERO(&rfd);
/* fd - is a netlink socket */
FD_SET(fd, &rfd);
ret = select(fd + 1, &rfd, NULL, NULL, &tv);
if (ret < 0) {
perror("select()");
continue;
} else if (ret == 0) {
printf("Timeout on fd %d", fd);
} else if (FD_ISSET(fd, &rfd)) {
/*
count = recv(fd, buf ...)
while (count > 0) {
parse 'buf' for netlink message, validate etc.
count = recv(fd, buf)
}
*/
}
}
So I'm observing now that code inside else if (FD_ISSET(fd, &rfd)) { branch blocks at the second recv() call.
Now I'm trying to understand if I need to set the netlink socket to non-blocking (SOCK_NOBLOCK for example), but then I probably don't need select() at all, I simply can have recv() -> message parse -> recv() loop and it won't block.
... if I need to set the netlink socket to non-blocking ..., but then I probably don't need select() at all ...
Exactly this is the purpose of a non-blocking socket: Instead of doing the if(FD_ISSET(...)) you call recv() and evaluate the return value.
If you use blocking sockets, you must not call recv() more than once after calling select(); then the program is "effectively" non-blocking.
HOWEVER,
... as user "kaylum" already suggested in his comment, you'll have another problem in any case:
It is not guaranteed that one complete "message" is available at the same time. The other end of the socket might send the first part of the message, wait some seconds and then send the second part of the message.
However, select() will tell you that there is at least one byte available; it will not tell you if the complete message is available.
If you want to wait for the complete message in the inner loop (while(count > 0)), you will always have to wait (which means that your program has "effectively" a blocking behavior even if the socket is non-blocking).
If you simply want to process all bytes already available in the inner loop, then the condition count > 0 is wrong. Instead, you should do something like this if you are working with blocking sockets:
else if(FD_ISSET(...))
{
while(FD_ISSET(...))
{
count = recv(...);
if(count > 0)
{
...
select(...);
}
else FD_ZERO(...);
}
}
However, in most cases this will not be necessary and you can simply process the "remaining" data bytes in the next "outer" loop.

Server crashes after closesocket

I have multithreading application, it's periodically polling a few hundred devices.
Each thread serves one device, its socket and other descriptors are encapsulated at individual object, so no shared descriptors.
Occasionally application crashes after closesocket(fSock), when I try set descriptor fSock to 0.
I assume, I should not set fSock = 0, if closesocket(fSock) returns SOCKET_ERROR.
Or is there any other reason?
My code:
bool _EthDev::Connect()
{
int sockErr, ret, i, j;
int szOut = sizeof(sockaddr_in);
// create socket
if ((fSock = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
{
sockErr = GetLastError();
Log("Invalid socket err %d", sockErr);
fSock = 0;
return false;
}
// set fast closing socket (by RST)
linger sLinger;
sLinger.l_onoff = 1;
sLinger.l_linger = 0;
if (sockErr = setsockopt(fSock, SOL_SOCKET, SO_LINGER, (const char FAR*)&sLinger, sizeof(linger)))
{
sockErr = WSAGetLastError();
Log("Setsockopt err %d", sockErr);
closesocket(fSock);
fSock = 0; // here crashes
return false;
}
// connect to device
fSockaddr.sin_port = htons((u_short)(baseport));
if (connect(fSock, (struct sockaddr*)&fSockaddr, szOut))
{
closesocket(fSock);
fSock = 0;
return false;
}
...
return true;
}
I have multithreading application, ... [it] occasionally crashes
A multithreading application that occasionally crashes is a classic symptom of a race condition. I think to prevent the crashes you need to figure out what the race condition is in your code, and fix that.
I assume, I should not set fSock = 0, if closesocket(fSock) returns
SOCKET_ERROR. Or is there any other reason?
I doubt the problem is actually related to closesocket() or to setting fSock to 0. Keep in mind that sockets are really just integers, and setting an integer to 0 isn't likely to cause a crash on its own. What could cause a crash is a write to invalid memory -- and fSock = 0 does write to the memory location where the member variable fSock is (or was) located at.
Therefore, a more likely hypothesis is that the _EthDev object got deleted by thread B while thread A was still in the middle of calling Connect() on it. This would be most likely happen while the connect() call was executing, because a blocking connect() call can take a relatively long time to return. So if there was another thread out there that rudely deleted the _EthDev object during the connect() call, then as soon as connect() returned, the next line of code that would write to the location where the (now deleted) _EthDev object used to be would be the "fSock = 0;" line, and that could cause a crash.
I suggest you review your code that deletes _EthDev objects, and if it isn't careful to first shut down any thread(s) using those objects (and also to wait for the threads to exit!) before deleting the _EthDev objects, you should rewrite it so that it does so reliably. Deleting an object while another thread might still be using it is asking for trouble.

C++ usrsctp callback parameters null

I'm currently creating a network application that uses the usrsctp library on windows and I'm having an odd problem with parameters appearing as null when they shouldn't be on a callback function. I'm not sure if this is a specific usrsctp issue or something I'm doing wrong so I wanted to check here first.
When creating a new sctp socket you pass a function as one of the parameters that you want to be called when data is received as shown in the code below
static int receive_cb(struct socket *sock, union sctp_sockstore addr, void *data,
size_t datalen, struct sctp_rcvinfo rcv, int flags, void *ulp_info)
{
if (data == NULL) {
printf("receive_cb - Data NULL, closing socket...\n");
done = 1;
usrsctp_close(sock);
}
else {
_write(_fileno(stdout), data, datalen);
free(data);
}
return (1);
}
...
//Create SCTP socket
if ((sctpsock = usrsctp_socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP, receive_cb, NULL, 0, NULL)) == NULL) {
perror("usrsctp_socket");
return -1;
}
Tracing through the library I can see that before the call back is called all the parameters are correct
As soon as I step into it they become null
I've no idea what would cause this, the callback function was taken straight from the official examples so nothing should be wrong there.
Ok, worked out the issue, it seems that the parameter before 'union sctp_sockstore addr' was causing the stack to be pushed by 0x1c and moving the rest of the parameters away from where they should be. I've never come across this issue before but changing the parameter to a pointer fixed it.
I had the same Issue, in my case the reason was a missing define for INET.
Since the size of 'union sctp_sockstore' depends on this define.
So you have to ensure, that you use the same defines as you used when compiling the library.

How to handle all possible errors on async socket?

I have decided to use async io for my project and simply do a single threaded loop where I try to read some data each frame from each open socket. This worked quite well and overall I'm happy with it for now. The problem is weird problems I'm having with the async sockets.
I have code like this:
accept a connection...
fcntl(O_NONBLOCK) on the client socket...
int rc;
if((rc = recv(socket))>0)
process data
if rc == 0
close socket and cleanup
The problem is that I get rc == 0 sometimes even though I know that the connection is not closed. If I don't clean up then my app works as normal. But if I do cleanup then the client receives a disconnect before the connection is even established.
So my question is: Do I have to check somehow whether the socket is ready before doing a recv in order to get the correct return value from it?
Most of the information I have been able to find was inconclusive. I found a references to select() but it seems to block until there is a status change on the socket - but I need the socket to be nonblocking.
What I'm looking for is just the intuitive behavior that if there is data, it is read to the buffer and recv returns number of bytes read, if there is no data it returns -1 and if the socket is disconnected then it should return 0.
Do I have to do anything else to the socket before calling recv to make it work as expected?
First, taking on the heavy lifting of going "all asynchronous" with a socket server is a good start for a design and will enable scalability very easily.
As for your question.
recv() will return the following values:
A postive value returned by recv() indicates the number of bytes
copied to your buffer.(i.e you actually received these bytes)
recv() will return 0 when the socket was closed by the remote side.
For async sockets, recv() will return -1 and set errno to either
EAGAIN or EWOULDBLOCK if the connection is still valid, but there's
no new data to be consumed. Call select() or poll() on the socket to
wait for data.
Otherwise, any general connection failure will result in -1 being returned by recv(). (And the only thing you can do is close the socket).
So when you say, "rc == 0 sometimes even though I know that the connection is not closed", I suspect your pseudocode is not checking the return value, but instead checking the result of (rc > 0).
This is closer to the logic you want:
int rc;
rc = recv(s, buf, buffersize, 0);
if (rc == 0)
{
/* socket closed by remote end */
close(s); s=-1;
}
else if ((rc == -1) && ((errno == EAGAIN) || (errno == EWOULDBLOCK)) )
{
// need to wait. Call select() or poll()
}
else if (rc == -1)
{
close(s); s=-1;
}
else
{
ProcessNewData(s, buffer, rc);
}

"window procedure" of a newly created thread without window

I want to create a thread for some db writes that should not block the ui in case the db is not there. For synchronizing with the main thread, I'd like to use windows messages. The main thread sends the data to be written to the writer thread.
Sending is no problem, since CreateThread returns the handle of the newly created thread. I thought about creating a standard windows event loop for processing the messages. But how do I get a window procedure as a target for DispatchMessage without a window?
Standard windows event loop (from MSDN):
while( (bRet = GetMessage( &msg, NULL, 0, 0 )) != 0)
{
if (bRet == -1)
{
// handle the error and possibly exit
}
else
{
TranslateMessage(&msg);
DispatchMessage(&msg);
}
}
Why windows messages? Because they are fast (windows relies on them) and thread-safe. This case is also special as there is no need for the second thread to read any data. It just has to recieve data, write it to the DB and then wait for the next data to arrive. But that's just what the standard event loop does. GetMessage waits for the data, then the data is processed and everything starts again. There's even a defined signal for terminating the thread that is well understood - WM_QUIT.
Other synchronizing constructs block one of the threads every now and then (critical section, semaphore, mutex). As for the events mentioned in the comment - I don't know them.
It might seem contrary to common sense, but for messages that don't have windows, it's actually better to create a hidden window with your window proc than to manually filter the results of GetMessage() in a message pump.
The fact that you have an HWND means that as long as the right thread has a message pump going, the message is going to get routed somewhere. Consider that many functions, even internal Win32 ones, have their own message pumps (for example MessageBox()). And the code for MessageBox() isn't going to know to invoke your custom code after its GetMessage(), unless there's a window handle and window proc that DispatchMessage() will know about.
By creating a hidden window, you're covered by any message pump running in your thread, even if it isn't written by you.
EDIT: but don't just take my word for it, check these articles from Microsoft's Raymond Chen.
Thread messages are eaten by modal loops
Why do messages posted by PostThreadMessage disappear?
Why isn't there a SendThreadMessage function?
NOTE: Refer this code only when you don't need any sort of UI-related or some COM-related code. Other than such corner cases, this code works correctly: especially good for pure computation-bounded worker thread.
DispathMessage and TranslateMessage are not necessary if the thread is not having a window. So, simply just ignore it. HWND is nothing to do with your scenario. You don't actually need to create any Window at all. Note that that two *Message functions are needed to handle Windows-UI-related message such as WM_KEYDOWN and WM_PAINT.
I also prefer Windows Messages to synchronize and communicate between threads by using PostThreadMessage and GetMessage, or PeekMessage. I wanted to cut and paste from my code, but I'll just briefly sketch the idea.
#define WM_MY_THREAD_MESSAGE_X (WM_USER + 100)
#define WM_MY_THREAD_MESSAGE_Y (WM_USER + 100)
// Worker Thread: No Window in this thread
unsigned int CALLBACK WorkerThread(void* data)
{
// Get the master thread's ID
DWORD master_tid = ...;
while( (bRet = GetMessage( &msg, NULL, 0, 0 )) != 0)
{
if (bRet == -1)
{
// handle the error and possibly exit
}
else
{
if (msg.message == WM_MY_THREAD_MESSAGE_X)
{
// Do your task
// If you want to response,
PostThreadMessage(master_tid, WM_MY_THREAD_MESSAGE_X, ... ...);
}
//...
if (msg.message == WM_QUIT)
break;
}
}
return 0;
}
// In the Master Thread
//
// Spawn the worker thread
CreateThread( ... WorkerThread ... &worker_tid);
// Send message to worker thread
PostThreadMessage(worker_tid, WM_MY_THREAD_MESSAGE_X, ... ...);
// If you want the worker thread to quit
PostQuitMessage(worker_tid);
// If you want to receive message from the worker thread, it's simple
// You just need to write a message handler for WM_MY_THREAD_MESSAGE_X
LRESULT OnMyThreadMessage(WPARAM, LPARAM)
{
...
}
I'm a bit afraid that this is what you wanted. But, the code, I think, is very easy to understand. In general, a thread is created without having message queue. But, once Window-message related function is called, then the message queue for the thread is initialized. Please note that again no Window is necessary to post/receive Window messages.
You don't need a window procedure in your thread unless the thread has actual windows to manage. Once the thread has called Peek/GetMessage(), it already has the same message that a window procedure would receive, and thus can act on it immediately. Dispatching the message is only necessary when actual windows are involved. It is a good idea to dispatch any messages that you do not care about, in case other objects used by your thread have their own windows internally (ActiveX/COM does, for instance). For example:
while( (bRet = GetMessage(&msg, NULL, 0, 0)) != 0 )
{
if (bRet == -1)
{
// handle the error and possibly exit
}
else
{
switch( msg.message )
{
case ...: // process a message
...
break;
case ...: // process a message
...
break;
default: // everything else
TranslateMessage(&msg);
DispatchMessage(&msg);
break;
}
}
}

Resources