I have a multithreaded application running on WIN32 which uses a semaphore to protect a linked-list. Very occasionally this locks-up. When it locks-up I can stop the application in cppvsdbg under Visual Studio Code and I can see that two of the threads are waiting on the semaphore, i.e. they are blocked at:
WaitForSingleObject(handle, INFINITE);
However, the third thread is blocked here:
ReleaseSemaphore(handle, 1, NULL);
...i.e. it seems to have blocked on ReleaseSemaphore(), the very function which of course would allow one of the other two threads to run. If I single step in the debugger, or I set a break-point just beyond the ReleaseSemaphore() call and continue running, nothing budges, the application remains locked up. The thread that is blocked at ReleaseSemaphore() is running at priority 0, the other two threads at priorities 0 and -1, so I can't see how thread priority could cause an issue.
More than that, I don't understand why ReleaseSemaphore() would block under any circumstances. The value of handle is 0x000000ec, which is what it was at the start of the day, so the value of handle hasn't been corrupted, though I guess it is possible that the contents of handle might have been messed up somehow...? Not sure how I would debug that.
Does anyone have any suggestions as to why ReleaseSemaphore() might lock, or what additional things I might poke at in the debugger when the problem occurs to determine what's up?
EDIT: the code is compiled with /Od to avoid any misalignment between the visual debug and the code, this is a screen-shot of what the cppvsdbg window shows for the thread which appears to be blocked on ReleaseSemaphore():
And the correct answer is: no, the Win32 API function ReleaseSemaphore() can never block.
The reason it appeared to be blocking in this case was because, separately, we needed to simulate critical sections on Windows (recalling that this code usually runs in an embedded system on an RTOS, Windows is only for rapid development). To simulate a critical section on Windows we call SuspendThread() (and later ResumeThread()) on all threads except the current thread. A failure was occurring elsewhere in the code while such a simulated critical section was in place, and it so happened that this coincided with the ReleaseSemaphore() call most of the time, making it look as though ReleaseSemaphore() had blocked; it hadn't, the thread just happened to get suspended there.
So we just had to fix the other bug and this apparent problem went away.
Related
Is this the correct way to free TCriticalSection object created inside the initialization section in Delphi?
initialization
FPoolingCS := TCriticalSection.Create;
finalization
FPoolingCS.Acquire;
FreeAndNil(FPoolingCS);
Should I call the Release method before the Free?
Could the Acquire method throw some exceptions that I need to handle?
This is not the correct way to release critical section for several reasons.
According to documentation EnterCriticalSection function
This function can raise EXCEPTION_POSSIBLE_DEADLOCK if a wait
operation on the critical section times out. The timeout interval is
specified by the following registry value:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session
Manager\CriticalSectionTimeout. Do not handle a possible deadlock
exception; instead, debug the application.
If a critical section is deleted while it is still owned, the state of
the threads waiting for ownership of the deleted critical section is
undefined.
While a process is exiting, if a call to EnterCriticalSection would
block, it will instead terminate the process immediately. This may
cause global destructors to not be called.
Calling FPoolingCS.Acquire on Windows platform calls EnterCriticalSection function. So the answer to first question, whether acquiring critical section can raise exception, is yes.
Also according to documentation, you should not try to handle such exceptions, but you have to debug the application, because origin of the problem is in some other code.
But the most notable reason why you should not call Acquire before releasing critical section instance on any platform is that if at that point there are some other threads that are doing some work and are relying on that critical section means your shutdown and cleaning process is completely broken. In other words, if Acquire solves your problem, real issue is in another castle and you haven't really solved anything, you have just slightly changed the dynamics, which may work and may not work, depending on all other code involved.
Calling Release before Free would be meaningless, for the same reason. If there are other involved threads still running, they may acquire the lock before Free is executed.
Just call Free on critical section, or if you like use FreeAndNil that will ultimately crash if your shutdown process is broken. Just remember, threading issues are not consistently reproducible, so absence of a crash still does not mean you have fully bug free code.
I have read that SendMessage() should not be used to access UI controls from other threads, but I'm not sure I know why, the only reason that I can think of is since SendMessage() is a blocking call, then it could cause a deadlock in certain situations.
But is this the only reason not to use it?
Edit: This article talks about the reasons not to use SendMessage() but I don't find it to be very clear (it is intended for .NET).
It is best to keep in mind that the odds that you will write correct code are not very good. And the generic advice is don't do it! It is never necessary, the UI thread of a GUI program in Windows was entirely structured to make it simple to allow code that runs on another thread or inside a process affect the UI of the program. The point of the message loop, the universal solution to the producer-consumer problem. PostMessage() is your weapon to take advantage of it.
Before you forge ahead anyway, start by thinking about a simple problem that's very hard to solve when you use SendMessage. How do you close a window safely and correctly?
Given is that the exact moment in time that you need to close the window is entirely unpredictable and completely out of sync with the execution of your worker thread. It is the user that closes it, or asks the UI thread to terminate, you need to make sure that the thread has exited and stops calling SendMessage before you can actually close the window.
The intuitive way to do this is to signal an event in your WM_CLOSE message handler, asking the thread to stop. And wait for it to complete, then the window can close. Intuitive, but it does not work, it will deadlock your program. Sometimes, not always, very hard to debug. Goes wrong when the thread cannot check the event because it is stuck in the SendMessage call. Which cannot complete since the UI thread is waiting for the thread to exit. The worker thread cannot continue and the UI thread cannot continue. A "deadly embrace", your program will hang and needs to be killed forcibly. Deadlock is a standard threading bug.
You'll shout, "I'll use SendMessageTimeout!" But what do you pass for the uTimeout argument and how do you interpret an ERROR_TIMEOUT error? It is pretty common for a UI thread to go catatonic for a while, surely you've seen the "ghost window" before, the one that shows 'Not Responding` in the title bar. So an ERROR_TIMEOUT does not reliably indicate that the UI thread is trying to shut down unless you make uTimeout very large. At least 10 seconds. That kinda works, getting the occasional 10 second hang at exit is however not very pretty.
Solve this kind of problem for all the messages, not just WM_CLOSE. WM_PAINT ought to be next, another one that's very, very hard to solve cleanly. Your worker thread asks to update the display a millisecond before the UI thread calls EndPaint(). And thus never displays the update, it simply gets lost. A threading race, another standard threading bug.
The third classic threading bug is a fire-hose problem. Happens when your worker thread produces results faster than the UI thread can handle them. Very common, UI updates are pretty expensive. Easy to detect, very hard to solve and unpredictable when it occurs. Easy to detect because your UI will freeze, the UI thread burns 100% core trying to keep up with the message rate. It doesn't get around to its low-priority tasks anymore. Like painting. Goes wrong both when you use SendMessage or PostMessage. In the latter case you'll fill the message queue up to capacity. It starts failing after it contains 10000 unprocessed messages.
Long story short, yes, SendMessage() is thread-safe. But thread-safety is not a transitive property, it doesn't automatically make your own code thread-safe. You still suffer from all the things that can go wrong when you use threads. Deadlocks, races, fire-hosing. Fear the threading beast.
my understanding about debugging process and debuggers is that when a breakpoint gets hit, all other threads gets frozen. However one of my colleague said that this option is configurable meaning that somewhere in Visual Studio options you can configure that other threads (where there is no breakpoint) continue to work as normal although the thread with breakpoint get frozen. I couldn't find any such settings in visual studio plus my colleague does not remember where he saw that setting although he seem pretty confident that this option exists.
Can someone confirm if its even possible to have other threads running while one thread gets frozen due to breakpoint? Also if there is such a setting, please let me know where to find it.
The debugger always freezes all threads when a breakpoint hits. You have however do have control over what happens to threads when you press F5 to continue execution. You can use the Freeze toolbar button available in the Debug + Windows + Threads debugger window to prevent a thread from continuing when you press F5. Use the Thaw button to re-enable it.
I'm not familiar with VS, but I know gdb support non-stop mode since version 7.10, so I think it is possible to do like this with VS.
Here is the summary: "For some multi-threaded targets, GDB supports an optional mode of operation in which you can examine stopped program threads in the debugger while other threads continue to execute freely. This minimizes intrusion when debugging live systems, such as programs where some threads have real-time constraints or must continue to respond to external events. This is referred to as non-stop mode."
You can search 'non-stop gdb' for more details.
I don't know if this is possible but frankly if it is, it shouldn't be. Yes it is theoretically possible to break one thread while the others keep running, but keep in mind that with this there is the potential that one of the running threads will try to interact with the frozen thread. this causes all kinds of problems with your current frozen thread. I suspect the debugger was designed with this in mind, so there isn't a setting that allows this. If someone else knows differently please let me know because i find myself curious as well
I'm looking for a way to debug a rare Delphi 7 critical section (TCriticalSection) hang/deadlock. In this case, if a thread is waiting on a critical section for more than say 10 seconds, I'd like to produce a report with the stack trace of both the thread currently locking the critical section and also the thread that failed to be able to lock the critical section after waiting 10 seconds. It is OK then if an exception is raised or the Application terminates.
I would prefer to continue using critical sections, rather than using other synchronization primitives, if possible, but can switch if necessary (such as to get a timeout feature).
If the tool/method works at runtime outside of the IDE, that is a bonus, since this is hard to reproduce on demand. In the rare case I can duplicate the deadlock inside the IDE, if I try to Pause to start debugging, the IDE just sits there doing nothing, and never gets to a state where I can view threads or call stacks. I can Reset the running program, though.
Update: In this case, I'm only dealing with one critical section and 2 threads, so this likely isn't a lock ordering problem. I believe there is an improper nested attempt to enter the lock across two different threads, which results in deadlock.
You should create and use your own lock object class. It can be implemented using critical sections or mutexes, depending on whether you want to debug this or not.
Creating your own class has an added benefit: You can implement a locking hierarchy and raise an exception when it is violated. Deadlocks happen when locks are not taken in exactly the same order, every time. Assigning a lock level to each lock makes it possible to check that the locks are taken in the correct order. You could store the current lock level in a threadvar, and allow only locks to be taken that have a lower lock level, otherwise you raise an exception. This will catch all violations, even when no deadlock happens, so it should speed up your debugging a lot.
As for getting the stack trace of the threads, there are many questions here on Stack Overflow dealing with this.
Update
You write:
In this case, I'm only dealing with one critical section and 2 threads, so this likely isn't a lock ordering problem. I believe there is an improper nested attempt to enter the lock across two different threads, which results in deadlock.
That can't be the whole story. There's no way to deadlock with two threads and a single critical section alone on Windows, because critical sections can be acquired there recursively by a thread. There has to be another blocking mechanism involved, like for example the SendMessage() call.
But if you really are dealing with two threads only, then one of them has to be the main / VCL / GUI thread. In that case you should be able to use the MadExcept "Main thread freeze checking" feature. It will try to send a message to the main thread, and fail after a customizable time has elapsed without the message being handled. If your main thread is blocking on the critical section, and the other thread is blocking on a message handling call then MadExcept should be able to catch this and give you a stack trace for both threads.
This is not a direct anwer to your question, but something I ran into recently that had me (and a couple of colleagues) stumped for a while.
It was an intermittent thread hang, involving a critical section and once we knew the cause, it was very obvious and gave all of us a "d'oh" moment. However, it did take some serious hunting to find (adding more and more trace logging to pinpoint the offending statement) and that is why I thought I'd mention it.
It also was on a critical section enter. Another thread had indeed acquired that critical section. A dead lock as such did not seem to be the cause, as there was only one critical section involved, so there could be no problems with acquiring locks in a different order. The thread holding the critical section should simply have continued and then released the lock, allowing the other thread to acquire it.
In the end it turned out that the thread holding the lock was ultimately accessing the ItemIndex of a (IIRC) combobox, fairly innocuous it would seem. Unfortunately, getting that ItemIndex is reliant on message processing. And the thread waiting for the lock was the main application thread... (just in case anybody wonders: the main thread does all the message processing...)
We might have thought of this a lot earlier if it had been a little more obvious from the start that the vcl was involved. However, it started in non-ui related code and vcl involvement only became apparent after adding instrumentation (enter - exit tracing) along the call tree and back through all triggered events and their handlers up to the ui code.
Just hope this story will be of help to somebody faced with a mysterious hang.
Use Mutex instead of Critical Section. There is a little difference between mutexes and critical sections - critical sections are more effective while mutexes are more flexible. Your can easily switch between mutexes and critical sections, using for example mutexes in debug version.
for critical section we use:
var
FLock: TRTLCriticalSection;
InitializeCriticalSection(FLock); // create lock
DeleteCriticalSection(FLock); // free lock
EnterCriticalSection(FLock); // acquire lock
LeaveCriticalSection(FLock); // release lock
the same with mutex:
var FLock: THandle;
FLock:= CreateMutex(nil, False, nil); // create lock
CloseHandle(FLock); // free lock
WaitForSingleObject(FLock, Timeout); // acquire lock
ReleaseMutex(FLock); // release lock
You can use timeouts (in milliseconds; 10000 for 10 seconds) with mutexes by implementing acquire lock function like this:
function AcquireLock(Lock: THandle; TimeOut: LongWord): Boolean;
begin
Result:= WaitForSingleObject(Lock, Timeout) = WAIT_OBJECT_0;
end;
You can also use Critical Sections with the TryEnterCriticalSection API instead of EnterCriticalSection.
If you use TryEnterCriticalSection and the lock acquisition fails, the API returns False and you can deal with the failure in any way you see fit, instead of just locking the thread.
Something like
while not TryEnterCriticalSection(fLock) and (additional_checks) do
begin
deal_with_failure();
sleep(500); // wait 500 ms
end;
Do note that Delphi's TCriticalSection uses EnterCriticalSection so unless you tweak that class, you will have to do your own class or you'll have to deal with the Critical Section initialization/deinitialization.
If you want to be able to wait on something with a timeout, you could try replacing your Critical Section with a TEvent signal. You can say to wait on the event, give it a timeout length, and check the result code. If the signal was set, then you can continue. If not, if it timed out, you raise an exception.
At least, that's how I'd do it in D2010. I'm not sure if Delphi 7 has TEvent, but it probably does.
I have a threading problem with Delphi. I guess this is common in other languages too. I have a long process which I do in a thread, that fills a list in main window. But if some parameters change in the mean time, then I should stop current executing thread and start from the beginning. Delphi suggests terminating a thread by setting Terminated:=true and checking for this variable's value in the thread. However my problem is this, the long executing part is buried in a library call and in this call I cannot check for the Terminated variable. Therefore I had to wait for this library call to finish, which affects the whole program.
What is the preferred way to do in this case? Can I kill the thread immediately?
The preferred way is to modify the code so that it doesn't block without checking for cancellation.
Since you can't modify the code, you can't do that; you either have to live with the background operation (but you can disassociate it from any UI, so that its completion will be ignored); or alternatively, you can try terminating it (TerminateThread API will rudely terminate any thread given its handle). Termination isn't clean, though, like Rob says, any locks held by the thread will be abandoned, and any cross-thread state protected by such locks may be in a corrupted state.
Can you consider calling the function in a separate executable? Perhaps using RPC (pipes, TCP, rather than shared memory owing to same lock problem), so that you can terminate a process rather than terminating a thread? Process isolation will give you a good deal more protection. So long as you aren't relying on cross-process named things like mutexes, it should be far safer than killing a thread.
The threads need to co-operate to achieve a graceful shutdown. I am not sure if Delphi offers a mechanism to abort another thread, but such mechanisms are available in .NET and Java, but should be considered an option of last resort, and the state of the application is indeterminate after they have been used.
If you can kill a thread at an arbitrary point, then you may kill it while it is holding a lock in the memory allocator (for example). This will leave your program open to hanging when your main thread next needs to access that lock.
If you can't modify the code to check for termination, then just set its priority really low, and ignore it when it returns.
I wrote this in reply to a similar question:
I use an exception-based technique
that's worked pretty well for me in a
number of Win32 applications.
To terminate a thread, I use
QueueUserAPC to queue a call to a
function which throws an exception.
However, the exception that's thrown
isn't derived from the type
"Exception", so will only be caught by
my thread's wrapper procedure.
I've used this with C++Builder apps very successfully. I'm not aware of all the subtleties of Delphi vs C++ exception handling, but I'd expect it could easily be modified to work.