Graciously killing a Visual C++ Serial thread - multithreading

I have a Visual Studio C++ 2013 MFC application that writes/reads data from a serial rotary encoder, and sends the read data to a stepper motor. The reading is in a while (flagTrue) {}. The while loop is spooled in a separate thread.
It does the job, but when I try to exit the application graciously,I keep getting this:
'System.ObjectDisposedException' mscorlib.dll
I tried setting timers for 1-2 seconds to let the serial listening finish, but it seems like the listening keeps going even when I seemingly have exited the thread. Here are snippets of the code:
//this is inside of the main window CDlg class
pSerialThread = (CSerialThread*)AfxBeginThread(RUNTIME_CLASS(CSerialThread));
//this is inside the CSerialThread::InitInstance() function
init_serial_port();
//this is the serial listening while loop
void init_serial_port() {
SerialPort^ serialPort = gcnew SerialPort();
while (bSerialListen) {
//do read/write using the serialPort object
}
serialPort->Close();
}
///this is in the OnOK() function
bSerialListen = false;
pSerialThread->ExitInstance();
An incomplete workaround, inspired Hans's answer below, was to have the thread reset a flag after port closes:
SerialPort^ serialPort = gcnew SerialPort();
serialIsOpen = true;
while (bSerialListen) {
//do read/write using the serialPort object
}
serialPort->Close();
serialIsOpen = false;
}
Then inside OnOK() (which does result in a clean exit):
bSerialListen=false;
//do other stuff which normally takes longer than the port closing.
if (serialIsOpen) {
//ask user to press Exit again;
return;
}
OnOK();
}
However, the user always has to press Exit twice, because the following never works
while (serialIsOpen) {
Sleep(100);
//safety counter, do not proceed to OnOK();
}
OnOK();
while expires before the port resets the flag, even if one waits for 10 seconds -- much longer than the user pressing the button twice:

while (bSerialListen) {
Very troublesome. First of all, a bool is not a proper thread synchronization primitive by a very long shot. Second of all, surely the most likely problem, is that your code isn't checking it. Because the thread is actually stuck in the SerialPort::Read() call. Which isn't completing because the device isn't sending anything at the moment you want to terminate your program.
What happens next is very rarely graceful. A good way to trigger an uncatchable ObjectDisposedException is to jerk the USB connector. The only other thing you can do when you see it not working and have no idea what to do next. Very Bad Idea. That makes many a USB driver throw up its hands in disgust, it knows a userland app has the port opened but it isn't there anymore and now starts failing any requests. Sometimes even failing the close request, very unpleasant. There is no way to do this in a graceful way, serial port are not plug & play devices. This trips ODE in a worker thread that SerialPort starts to raise events, it is uncatchable.
Never, never, never jerk the USB connector, using the "Safely Remove Hardware" tray icon is a rock-hard requirement for legacy devices like serial ports. Don't force it.
So what to do next? The only graceful way to get the SerialPort::Read() call to complete is to jerk the floor mat. You have to call SerialPort::Close(). You still get an ObjectDisposedException but now it is one you can actually catch. Immediately get out of the loop, don't do anything else and let the thread terminate.
But of course you have to do so from another thread since this one is stuck. Plenty of trouble doing that no doubt when you use MFC, the thread that wants it to exit is not a managed thread in your program. Sounds like you already discovered that.
The better way is the one you might find acceptable after you read this post. Just don't.

Related

FTDI D2XX Cancelling overlapped IO (OIO) after a USB cable disconnect and reconnect

My application uses a USB based FTDI chip and the D2XX driver. It uses OIO (Overlapped IO) to read and write to the USB. My requirements include a 30 second timeout, something I cannot reduce. The code appear quite robust and stable.
A new requirement is overcoming an inadvertent disconnect and reconnect of the USB cable (a nurse kicked the cable out).
Once receiving the device removed message from windows and determining that it is our FTDI device I found that I cannot receive new data on a reconnect from the OIO until the previous OIO calls timeout (the 30 second timeout from requirements).
Once I discover the disconnect, I loop on the following call until all queued up OIO are reaped:
bool CancelOIO()
{
if (!FtdiRemaining)
return false;
FT_SetTimeouts(FtdiHandle, 1, 1);
FT_W32_PurgeComm(FtdiHandle, PURGE_TXABORT | PURGE_RXABORT | PURGE_TXCLEAR | PURGE_RXCLEAR);
while (FtdiRemaining)
{
DWORD nBytes = 0;
if (!FT_W32_GetOverlappedResult(FtdiHandle, &FtdiOverLap[FtdiQindex], &nBytes, FALSE))
{
if (FT_W32_GetLastError(FtdiHandle) == ERROR_IO_INCOMPLETE)
return true;
if (FT_W32_GetLastError(FtdiHandle) != ERROR_OPERATION_ABORTED)
{
CString str;
str.Format("FT_W32_GetOverlappedResult failed with %d\r\n", FT_W32_GetLastError(FtdiHandle));
SM_WriteLog(str, RGB_LOG_NORMAL);
}
}
FtdiRemaining--;
FtdiTodo++;
FtdiQindex++;
if (FtdiQindex >= FtdiQueueSize)
FtdiQindex = 0;
}
return !!FtdiRemaining;
}
I set the timeout period to 1ms. This does not appear to change the timeout for previously scheduled OIO.
I called FT_W32_PurgeComm to cancel everything. This also does not appear to cancel the OIO.
I tried calling CancelIo and this returned an error, the handle is not valid. My understanding is that it is up the driver code to respond to it. It might be due to the disconnect.
Anyway, I call the above code in a loop until all scheduled OIO are reaped. Nothing happens for 30 seconds. Then all of the OIO seem to end in less than 1ms.
As a test of this code, I called with the USB cable connected and it returns in 1ms.
So the problem appears to be when the cable is unplugged.
Questions:
What am I missing?
Is there another call I can make?
Is this a bug?
Other things I tried:
Closing the handle before calling FT_W32_GetOverlappedResult. This results in fast return. But my app can no longer receive data from the new handle. Weird. Anyone know why?
Not calling this code. The app is able to receive new data from the new handle, but only after these timeout. why?
CyclePort. This does not cause these OIO to return more quickly. It does not change the use of the OIO receiving data until these timeout.
I found after repeated tests two distinct behaviors. The fastest one returns reasonably quickly, canceling OIO. The other used the full 30 second timeout.
Try reducing the timeout to reduce the delay in cancelling OIO.

Why can a sub-classed QThread simply fail to start?

This is using a sub-classed QThread based on the ideas expressed in the whitepaper "QThreads: You were not doing so wrong". It does not have an event loop, nor does it have slots. It just emits signals and stops. In fact its primary signal is the QThread finished one.
Basically I have a Qt using a background thread to monitor stuff. Upon finding what it is looking for, it records its data, and terminates.
The termination sends a signal to the main event loop part of the application, which processes it, and when done, starts the background anew. I can usually get this working for tens of seconds, but then it just seems to quit.
It seems that when the main application tries to start the thread, it doesn't really run. I base this on telemetry code that increments counters as procedures get executed.
basically
//in main application. Setup not shown.
//background points to the QThread sub-class object
void MainWindow::StartBackground()
{
background->startcount++;
background->start();
if ( background->isRunning() )
{
background->startedcount++;
}
}
//in sub-classed QThread
void Background::run()
{
runcount++;
//Do stuff until done
}
So when I notice that it seems that my background thread isn't running, by watching Process Explorer, I cause the debugger to break in, and check the counts. What I see is that startcount and startedcount are equal. And have a value of one greater than runcount
So I can only conclude that the thread didn't really run, but I have been unable to find out any evidence of why.
I have not been able to find documentation on QThreads not starting do to some error condition, or what evidence there is of such an error.
I suppose I could set up a slot to catch started from the thread. The starting code could loop on a timed-out semaphore, trying again and again until the started slot actually resets the semaphore. But it feels ugly.
EDIT - further information
So using the semaphore method, I have a way to breakpoint on failure to start.
I sampled isFinished() right before I wanted to do start(), and it was false. After my 100ms semaphore timeout it became true.
So the question seems to be evolving into 'Why does QThread sometimes emit a finished() signal before isFinished() becomes true?'
Heck of a race condition. I'd hate to spin on isFinished() before starting the next background thread.
So this may be a duplicate of
QThread emits finished() signal but isRunning() returns true and isFinished() returns false
But not exactly, because I do override run() and I have no event loop.
In particular the events 8 and 9 in that answer are not in the same order. My slot is getting a finished() before isFinished() goes true.
I'm not sure an explicit quit() is any different than letting run() return;
It sounds as if you have a race condition wherein you may end up trying to restart your thread before the previous iteration has actually finished. If that's the case then, from what I've seen, the next call to QThread::start will be silently ignored. You need to update your code so that it checks the status of the thread before restarting -- either by calling QThread::isFinished or handling the QThread::finished signal.
On the other hand... why have the thread repeatedly started/stopped. Would it not be easier to simply start the thread once? Whatever code is run within the context of QThread::run can monitor whatever it monitors and signal the main app when it finds anything of note.
Better still. Separate the monitor logic from the thread entirely...
class monitor: public QObject {
.
.
.
};
QThread monitor_thread;
monitor monitor;
/*
* Fix up any signals to/from monitor.
*/
monitor.moveToThread(&monitor_thread);
monitor_thread.start();
The monitor class can do whatever it wants and when it's time to quit the app can just call monitor_thread::quit.
There is a race condition in the version of Qt I am using. I don't know if it was reported or not before, but I do not have the latest, so it's probably moot unless I can demonstrate it in the current version.
Similar bugs were reported here long ago:
QThread.isFinished returns False in slot connected to finished() signal
(the version I use is much more recent than Qt 4.8.5)
What more important is I can workaround it with the following code
while ( isRunning() )
{
msleep(1);
}
start();
I've run a few tests, and it never seems to take more than 1ms for the race condition to settle. Probably just needs a context switch to clean up.

What is the difference between blocking and non-blocking sockets? (for realz edition)

Before everybody marks this as a dup let me state that I know my fair share of network programming and this question is my attempt to solve something that riddles me even after finding the "solution".
The setup
I've spend the last weeks writing some glue code to incorporate a big industrial system into our current setup. The system is controlled by a Windows XP computer (PC A) which is controlled from a Ubuntu 14.04 system (PC B) by sending a steady stream of UDP packets at 2000 Hz. It responds with UDP packets containing the current state of the system.
Care was taken to ensure that the the 2000 Hz rate was held because there is a 3ms timeout after which the system faults and returns to a safe state. This involves measuring and accounting for inaccuracies in std::this_thread::sleep_for. Measurements show that there is only a 0.1% derivation from the target rate.
The observation
Problems started when I started to receive the state response from the system. The controlling side on PC B looks roughly like this:
forever at 2000Hz {
send current command;
if ( socket.available() >= 0 ) {
receive response;
}
}
edit 2: Or in real code:
auto cmd_buf = ...
auto rsp_buf = ...
while (true) {
// prepare and send command buffer
cmd_buf = ...
socket.send(cmd_buf, endpoint);
if (socket.available() >= 0) {
socket.receive(rsp_buf);
// the results are then parsed and stored, nothing fancy
}
// time keeping
}
Problem is that, whenever the receiving portion of the code was present on PC B, PC A started to run out of memory within seconds when trying to allocate receive buffers. Additionally it raised errors stating that the timeout was missed, which was probably due to packets not reaching the control software.
Just to highlight the strangeness: PC A is the pc sending UDP packets in this case.
Edit in response to EJP: this is the (now) working setup. It started out as:
forever at 2000Hz {
send current command;
receive response;
}
But by the time the response was received (blocking) the deadline was missed. Therefore the availability check.
Another thing that was tried was to receive in a seperate thread:
// thread A
forever at 2000Hz {
send current command;
}
// thread B
forever {
receive response;
}
Which displays the same behavior as the first version.
The solution
The solution was to set the socket on PC B to non blocking mode. One line and all problems were gone.
I am pretty sure that even in blocking mode the deadline was met. There should be no performance difference between blocking and non-blocking mode when there is just one socket involved. Even if checking the socket for available data takes some microseconds more than in non-blocking mode it shouldn't make a difference when the overall deadline is met accuratly.
Now ... what is happening here?
If I read your code correctly and referring to this code:
forever at 2000Hz {
send current command;
receive response;
}
Examine the difference between the blocking and not blocking socket. With blocking socket you send current command and then you are stuck waiting for the response. By this time I would guess you already miss the 2kHz goal.
Now in non blocking socket you send the current command, try to received whatever is in receive buffers, but if there is nothing there you return immediately and continue your tight 2kHz loop of sending. This explains to me why your industrial control system works fine in non-blocking code.

What is the reason for QProcess error status 5?

i have multiple threads running the following QProcess. Randomly they fail with error state 5. The Qt docs do not give any more details. Has anyone a clue what that error could come from? Thank you very much.
extCmd = new QProcess(this);
QString cmd = "/usr/bin/php";
QStringList argStr;
argStr << "/bin/sleep" << "10"; // changed to ever working command
extCmd->start(cmd, args);
bool suc = extCmd->waitForFinished(-1);
if (!suc) {
qDebug() << "finishing failed error="
<< extCmd.error()
<< extCmd.errorString();
}
Gives me the output:
finishing failed error= 5 "Unknown error"
Tangential to your problem is the fact that you should not be starting a thread per each process. A QProcess emits a finished(int code, QProcess::ExitStatus status) signal when it's done. It will also emit started() and error() upon successful and unsuccessful startup, respectively. Connect all those three signals to a slot in a QObject, then start the process, and deal with the results in the slots. You won't need any extra threads.
If you get a started() signal, then you can be sure that the process's file name was correct, and the process was started. Whatever exit code you get from finished(int) is then indicative of what the process did, perhaps in response to potentially invalid arguments you might have passed to it. If you get a error() signal, the process has failed to start because you gave a wrong filename to QProcess::start(), or you don't have correct permissions.
You should not be writing synchronous code where things happen asynchronously. Synchronous code is code that blocks for a particular thing to happen, like calling waitForCmdFinished. I wish that there was a Qt configuration flag that disables all those leftover synchronous blocking APIs, just like there's a flag to disable/enable Qt 3 support APIs. The mere availability of those blocking APIs promotes horrible hacks like the code above. Those APIs should be disabled by default IMHO. Just as there should be a test for moving QThread and derived classes to another thread. It's also a sign of bad design in every example of publicly available code I could find, and I did a rather thorough search to convince myself I wasn't crazy or something.
The only reasonable use I recall for a waitxxx method in Qt is the wait for a QThread to finish. Even then, this should be only called from within the ~QThread, so as to prevent the QThread from being destroyed with the tread still running.

boost::asio::read() never returns, even after a write() has been executed successfully on the other end

I'm trying to learn boost::asio for socket/networking programming. I'm trying to send some simple data from the client to the server.
Let me say first of all that I am intentionally using synchronous, blocking code, as opposed to asynchronous, non-blocking code, because I'm using multithreading (with the pthreads library) in addition to this.
The client is successfully calling boost::asio::write(). I've gone so far as to not only try and catch any exceptions thrown by boost::asio::write(), but to also check the boost::system::error_code value, which gives a message "The operation has been completed successfully" or something to that effect.
My read() code looks like this:
#define MAX_MESSAGE_SIZE 10000 // in bytes
void* receivedData = malloc(MAX_MESSAGE_SIZE);
try
{
boost::asio::read(*sock, boost::asio::buffer(receivedData, MAX_MESSAGE_SIZE));
}
catch (std::exception &e)
{
std::cout << "Exception thrown by boost::read() in receiveMessage(): " << e.what() << "\n";
delete receivedData;
receivedData = NULL;
return false;
}
Despite write() being successfully executed on the other end, and both client and server agreeing that a connection has been established at this point, read() never returns. The only case in which it returns for me is if I manually close the client application, at which point (as one would expect), the read() call throws an exception stating that the client has forcibly closed the connection.
Does it have anything to do with io_service.run()? In my googling to debug this, I ran across some mentions of run(), and what I've implicitly gathered from those posts is that run() processes the "work", which I take to mean that it does the actual sending and receiving, and that write() and read() are just means of queuing up a send and checking for what packets have already been sent in and "approved", so to speak, by io_service.run(). Please correct me if I'm wrong, as the Boost documentation says little more than "Run the io_service's event processing loop."
I'm following the boost::asio tutorial ( http://www.boost.org/doc/libs/1_47_0/doc/html/boost_asio/tutorial/tutdaytime1.html ) which makes absolutely no mention of run(), so maybe I'm completely on the wrong track here and run() isn't necessary at all?
Either way, just now I made another change to my code to see if anything would change (it didn't): in both my client and server, I set up a thread for the following function (to call io_service.run() repeatedly throughout the application's duration to see if not doing so is what was causing the problem):
void* workerThread(void* nothing)
{
while(1)
{
io_service.run();
Sleep(10); // just keeping my CPU from overheating in this infinite loop
}
}
But as stated above, that didn't affect the performance at all.
What am I missing here? Why is read() never returning, even after the other end's write() has been executed successfully?
Thanks in advance.
Note that the read you are using will block until the buffer is full or an error occurs - so it will only return when it has received 10000 bytes.
Consider using read_some or using a completion condition with read instead.

Resources