How to detect and debug multi-threading problems?

This is a follow up to this question, where I didn't get any input on this point. Here is the brief question:
Is it possible to detect and debug problems coming from multi-threaded code?
Often we have to tell our customers: "We can't reproduce the problem here, so we can't fix it. Please tell us the steps to reproduce the problem, then we'll fix it." It's a somehow nasty answer if I know that it is a multi-threading problem, but mostly I don't. How do I get to know that a problem is a multi-threading issue and how to debug it?
I'd like to know if there are any special logging frameworks, or debugging techniques, or code inspectors, or anything else to help solving such issues. General approaches are welcome. If any answer should be language related then keep it to .NET and Java.

Threading/concurrency problems are notoriously difficult to replicate - which is one of the reasons why you should design to avoid or at least minimize the probabilities. This is the reason immutable objects are so valuable. Try to isolate mutable objects to a single thread, and then carefully control the exchange of mutable objects between threads. Attempt to program with a design of object hand-over, rather than "shared" objects. For the latter, use fully synchronized control objects (which are easier to reason about), and avoid having a synchronized object utilize other objects which must also be synchronized - that is, try to keep them self contained. Your best defense is a good design.
Deadlocks are the easiest to debug, if you can get a stack trace when deadlocked. Given the trace, most of which do deadlock detection, it's easy to pinpoint the reason and then reason about the code as to why and how to fix it. With deadlocks, it always going to be a problem acquiring the same locks in different orders.
Live locks are harder - being able to observe the system while in the error state is your best bet there.
Race conditions tend to be extremely difficult to replicate, and are even harder to identify from manual code review. With these, the path I usually take, besides extensive testing to replicate, is to reason about the possibilities, and try to log information to prove or disprove theories. If you have direct evidence of state corruption you may be able to reason about the possible causes based on the corruption.
The more complex the system, the harder it is to find concurrency errors, and to reason about it's behavior. Make use of tools like JVisualVM and remote connect profilers - they can be a life saver if you can connect to a system in an error state and inspect the threads and objects.
Also, beware the differences in possible behavior which are dependent on the number of CPU cores, pipelines, bus bandwidth, etc. Changes in hardware can affect your ability to replicate the problem. Some problems will only show on single-core CPU's others only on multi-cores.
One last thing, try to use concurrency objects distributed with the system libraries - e.g in Java java.util.concurrent is your friend. Writing your own concurrency control objects is hard and fraught with danger; leave it to the experts, if you have a choice.

I thought that the answer you got to your other question was pretty good. But I'll emphasis these points.
Only modify shared state in a critical section (Mutual Exclusion)
Acquire locks in a set order and release them in the opposite order.
Use pre-built abstractions whenever possible (Like the stuff in java.util.concurrent)
Also, some analysis tools can detect some potential issues. For example, FindBugs can find some threading issues in Java programs. Such tools can't find all problems (they aren't silver bullets) but they can help.
As vanslly points out in a comment to this answer, studying well placed logging output can also very helpful, but beware of Heisenbugs.

For Java there is a verification tool called javapathfinder which I find it useful to debug and verify multi-threading application against potential race condition and death-lock bugs from the code.
It works finely with both Eclipse and Netbean IDE.
[2019] the github repository

Assuming I have reports of troubles that are hard to reproduce I always find these by reading code, preferably pair-code-reading, so you can discuss threading semantics/locking needs. When we do this based on a reported problem, I find we always nail one or more problems fairly quickly. I think it's also a fairly cheap technique to solve hard problems.
Sorry for not being able to tell you to press ctrl+shift+f13, but I don't think there's anything like that available. But just thinking about what the reported issue actually is usually gives a fairly strong sense of direction in the code, so you don't have to start at main().

In addition to the other good answers you already got: Always test on a machine with at least as many processors / processor cores as the customer uses, or as there are active threads in your program. Otherwise some multithreading bugs may be hard to impossible to reproduce.

Apart from crash dumps, a technique is extensive run-time logging: where each thread logs what it's doing.
The first question when an error is reported, then, might be, "Where's the log file?"
Sometimes you can see the problem in the log file: "This thread is detecting an illegal/unexpected state here ... and look, this other thread was doing that, just before and/or just afterwards this."
If the log file doesn't say what's happening, then apologise to the customer, add sufficiently-many extra logging statements to the code, give the new code to the customer, and say that you'll fix it after it happens one more time.

Sometimes, multithreaded solutions cannot be avoided. If there is a bug,it needs to be investigated in real time, which is nearly impossible with most tools like Visual Studio. The only practical solution is to write traces, although the tracing itself should:
not add any delay
not use any locking
be multithreading safe
trace what happened in the correct sequence.
This sounds like an impossible task, but it can be easily achieved by writing the trace into memory. In C#, it would look something like this:
public const int MaxMessages = 0x100;
string[] messages = new string[MaxMessages];
int messagesIndex = -1;
public void Trace(string message) {
int thisIndex = Interlocked.Increment(ref messagesIndex);
messages[thisIndex] = message;
The method Trace() is multithreading safe, non blocking and can be called from any thread. On my PC, it takes about 2 microseconds to execute, which should be fast enough.
Add Trace() instructions wherever you think something might go wrong, let the program run, wait until the error happens, stop the trace and then investigate the trace for any errors.
A more detailed description for this approach which also collects thread and timing information, recycles the buffer and outputs the trace nicely you can find at:
CodeProject: Debugging multithreaded code in real time 1

A little chart with some debugging techniques to take in mind in debugging multithreaded code.
The chart is growing, please leave comments and tips to be added.
(update file at this link)

Visual Studio allows you to inspect the call stack of each thread, and you can switch between them. It is by no means enough to track all kinds of threading issues, but it is a start. A lot of improvements for multi-threaded debugging is planned for the upcoming VS2010.
I have used WinDbg + SoS for threading issues in .NET code. You can inspect locks (sync blokcs), thread call stacks etc.

Tess Ferrandez's blog has good examples of using WinDbg to debug deadlocks in .NET.

assert() is your friend for detecting race-conditions. Whenever you enter a critical section, assert that the invariant associated with it is true (that's what CS's are for). Though, unfortunately, the check might be expensive and thus not suitable for use in production environment.

I implemented the tool vmlens to detect race conditions in java programs during runtime. It implements an algorithm called eraser.

Develop code the way that Princess recommended for your other question (Immutable objects, and Erlang-style message passing). It will be easier to detect multi-threading problems, because the interactions between threads will be well defined.

I faced a thread issue which was giving SAME wrong result and was not behaving un-predictably since each time other conditions(memory, scheduler, processing load) were more or less same.
From my experience, I can say that HARDEST PART is to recognize that it is a thread issue, and BEST SOLUTION is to review the multi-threaded code carefully. Just by looking carefully at the thread code you should try to figure out what can go wrong. Other ways (thread dump, profiler etc) will come second to it.

Narrow down on the functions that are being called, and rule out what could and could not be to blame. When you find sections of code that you suspect may be causing the issue, add lots of detailed logging / tracing to it. Once the issue occurs again, inspect the logs to see how the code executed differently than it does in "baseline" situations.
If you are using Visual Studio, you can also set breakpoints and use the Parallel Stacks window. Parallel Stacks is a huge help when debugging concurrent code, and will give you the ability to switch between threads to debug them independently. More info-

I'm using GNU and use simple script
$ more gdb_tracer
b func.cpp:2871
while (1)

The best thing I can think of is to stay away from multi-threaded code whenever possible. It seems there are very few programmers who can write bug free multi threaded applications and I would argue that there are no coders beeing able to write bug free large multi threaded applications.


What is the best way to understand and analyze a multithreading code?

I'm not looking for programming techniques. My question is rather about what is the best way to understand a code developed by a third party.
I have a code for an application in a specific language (it could be C/C++, Java, etc.). This code uses several threads to control different processes. The application generates a log that shows all calls to relevant functions for each thread.
I have to analyze this code to understand its operation and be able to make an improvement of the algorithm. I worked little with threads, so I do not know which is the most convenient way to start the analysis and follow the execution of each thread.
Could you give me any recommendation?
If you are able to contact any of the code's original developers, having a conversation with them (by voice or by email) and asking them to describe how they intended things to work is always preferable to only trying to reverse-engineer their intent by looking at the code. If you can't contact the developers directly, then perhaps there is a library-specific developer's forum or other on-line resource where you can discuss the library's structure with people who have experience using/debugging it.
If that's not an option (or if you've done that and still don't feel like you understand things well enough), then I often find that profiling (either via a profiling tool, or just by temporarily putting printf() [or similar] tracing-calls into the codebase at various places and seeing what gets printed when) is a good way to find out which parts of the code are actually being used at which stages of the program's execution. That will help you confirm (or disprove) your theories about how the codebase works. Knowing where and when each thread is spawned, where its entry-function is, and where/when it gets joined again by its parent thread are particularly useful.
Finally, start looking at the various pieces of data (e.g. objects and member variables) each thread examines and/or modifies, and how accesses to each those pieces of data is synchronized/serialized. Assuming the code isn't buggy, the critical sections of the codebase are good indicators of where inter-thread communication is happening.

threadscope functionality

Can programs be monitored while they are running (possibly piping the event log)? Or is it only possible to view event logs after execution. If the latter is the case, is there a deeper reason with respect to how the Haskell runtime works?
Edit: I don't know much about the runtime tbh, but given dflemstr's response, I was curious about how much and the ways in which performance is degraded by adding the event monitoring runtime option. I recall in RWH they mentioned that the rts has to add cost centres, but I wasn't completely sure about how expensive this sort of thing was.
The direct answer is that, no, it is not possible. And, no, there is no reason for that except that nobody has done the required legwork so far.
I think this would mainly be a matter of
Modifying ghc-events so it supports reading event logs chunk-wise and provide partial results. Maybe porting it over to attoparsec would help?
Threadscope would have to update its internal tree data structures as new data streams in.
Nothing too hard, but somebody would need to do it. I think I heard discussion about adding this feature already... So it might happen eventually.
Edit: And to make it clear, there's no real reason this would have to degrade performance beyond what you get with event log or cost centre profiling already.
If you want to monitor the performance of the application while it is running, you can for instance use the ekg package as described in this blog post. It isn't as detailed as ThreadScope, but it does the job for web services, for example.
To get live information about what the runtime is doing, you can use the dtrace program to capture dynamic events posted by some GHC runtime probes. How this is done is outlined in this wiki page. You can then use this information to put together a more coherent event log.

How Do I Choose Between the Various Ways to do Threading in Delphi?

It seems that I've finally got to implement some sort of threading into my Delphi 2009 program. If there were only one way to do it, I'd be off and running. But I see several possibilities.
Can anyone explain what's the difference between these and why I'd choose one over another.
The TThread class in Delphi
AsyncCalls by Andreas Hausladen
OmniThreadLibrary by Primoz Gabrijelcic (gabr)
... any others?
I have just read an excellent article by Gabr in the March 2010 (No 10) issue of Blaise Pascal Magazine titled "Four Ways to Create a Thread". You do have to subscribe to gain content to the magazine, so by copyright, I can't reproduce anything substantial about it here.
In summary, Gabr describes the difference between using TThreads, direct Windows API calls, Andy's AsyncCalls, and his own OmniThreadLibrary. He does conclude at the end that:
"I'm not saying that you have to choose anything else than the classical Delphi way (TThread) but it is still good to be informed of options you have"
Mghie's answer is very thorough and suggests OmniThreadLibrary may be preferable. But I'm still interested in everyone's opinions about how I (or anyone) should choose their threading method for their application.
And you can add to the list:
. 4. Direct calls to the Windows API
. 5. Misha Charrett's CSI Distributed Application Framework as suggested by LachlanG in his answer.
I'm probably going to go with OmniThreadLibrary. I like Gabr's work. I used his profiler GPProfile many years ago, and I'm currently using his GPStringHash which is actually part of OTL.
My only concern might be upgrading it to work with 64-bit or Unix/Mac processing once Embarcadero adds that functionality into Delphi.
If you are not experienced with multi-threading you should probably not start with TThread, as it is but a thin layer over native threading. I consider it also to be a little rough around the edges; it has not evolved a lot since the introduction with Delphi 2, mostly changes to allow for Linux compatibility in the Kylix time frame, and to correct the more obvious defects (like fixing the broken MREW class, and finally deprecating Suspend() and Resume() in the latest Delphi version).
Using a simple thread wrapper class basically also causes the developer to focus on a level that is much too low. To make proper use of multiple CPU cores a focus on tasks instead of threads is better, because the partitioning of work with threads does not adapt well to changing requirements and environments - depending on the hardware and the other software running in parallel the optimum number of threads may vary greatly, even at different times on the same system. A library that you pass only chunks of work to, and which schedules them automatically to make best use of the available resources helps a lot in this regard.
AsyncCalls is a good first step to introduce threads into an application. If you have several areas in your program where a number of time-consuming steps need to be performed that are independent of each other, then you can simply execute them asynchronously by passing each of them to AsyncCalls. Even when you have only one such time-consuming action you can execute it asynchronously and simply show a progress UI in the VCL thread, optionally allowing for cancelling the action.
AsyncCalls is IMO not so good for background workers that stay around during the whole program runtime, and it may be impossible to use when some of the objects in your program have thread affinity (like database connections or OLE objects that may have a requirement that all calls happen in the same thread).
What you also need to be aware of is that these asynchronous actions are not of the "fire-and-forget" kind. Every overloaded AsyncCall() function returns an IAsyncCall interface pointer that you may need to keep a reference to if you want to avoid blocking. If you don't keep a reference, then the moment the ref count reaches zero the interface will be freed, which will cause the thread releasing the interface to wait for the asynchronous call to complete. This is something that you might see while debugging, when exiting the method that created the IAsyncCall may take a mysterious amount of time.
OTL is in my opinion the most versatile of your three options, and I would use it without a second thought. It can do everything TThread and AsyncCalls can do, plus much more. It has a sound design, which is high-level enough both to make life for the user easy, and to let a port to a Unixy system (while keeping most of the interface intact) look at least possible, if not easy. In the last months it has also started to acquire some high-level constructs for parallel work, highly recommended.
OTL has a few dozen samples too, which is important to get started. AsyncCalls has nothing but a few lines in comments, but then it is easy enough to understand due to its limited functionality (it does only one thing, but it does it well). TThread has only one sample, which hasn't really changed in 14 years and is mostly an example of how not to do things.
Whichever of the options you choose, no library will eliminate the need to understand threading basics. Having read a good book on these is a prerequisite to any successful coding. Proper locking for example is a requirement with all of them.
There is another lesser known Delphi threading library, Misha Charrett's CSI Application Framework.
It's based around message passing rather than shared memory. The same message passing mechanism is used to communicate between threads running in the same process or in other processes so it's both a threading library and a distributed inter-process communication library.
There's a bit of a learning curve to get started but once you get going you don't have to worry about all the traditional threading issues such as deadlocks and synchronisation, the framework takes care of most of that for you.
Misha's been developing this for years and is still actively improving the framework and documentation all the time. He's always very responsive to support questions.
TThread is a simple class that encapsulates a Windows thread. You make a descendant class with an Execute method that contains the code this thread should execute, create the thread and set it to run and the code executes.
AsyncCalls and OmniThreadLibrary are both libraries that build a higher-level concept on top of threads. They're about tasks, discrete pieces of work that you need to have execute asynchronously. You start the library, it sets up a task pool, a group of special threads whose job is to wait around until you have work for them, and then you pass the library a function pointer (or method pointer or anonymous method) containing the code that needs to be executed, and it executes it in one of the task pool threads and handles a lot of the the low-level details for you.
I haven't used either library all that much, so I can't really give you a comparison between the two. Try them out and see what they can do, and which one feels better to you.
(sorry, I don't have enough points to comment so I'm putting this in as an answer rather than another vote for OTL)
I've used TThread, CSI and OmniThread (OTL). The two libraries both have non-trivial learning curves but are much more capable than TThread. My conclusion is that if you're going to do anything significant with threading you'll end up writing half of the library functionality anyway, so you might as well start with the working, debugged version someone else wrote. Both Misha and Gabr are better programmers than most of us, so odds are they've done a better job than we will.
I've looked at AsyncCalls but it didn't do enough of what I wanted. One thing it does have is a "Synchronize" function (missing from OTL) so if you're dependent on that you might go with AynscCalls purely for that. IMO using message passing is not hard enough to justify the nastiness of Synchronize, so buckle down and learn how to use messages.
Of the three I prefer OTL, largely because of the collection of examples but also because it's more self-contained. That's less of an issue if you're already using the JCL or you work in only one place, but I do a mix including contract work and selling clients on installing Misha's system is harder than the OTL, just because the OTL is ~20 files in one directory. That sounds silly, but it's important for many people.
With OTL the combination of searching the examples and source code for keywords, and asking questions in the forums works for me. I'm familiar with the traditional "offload CPU-intensive tasks" threading jobs, but right now I'm working on backgrounding a heap of database work which has much more "threads block waiting for DB" and less "CPU maxed out", and the OTL is working quite well for that. The main differences are that I can have 30+ threads running without the CPU maxing out, but stopping one is generally impossible.
I know this isn't the most advanced method :-) and maybe it has limitations too, but I just tried System.BeginThread and found it quite simple - probably because of the quality of the documentation I was referring to... (IMO Neil Moffatt could teach MSDN a thing or two)
That's the biggest factor I find in trying to learn new things, the quality of the documentation, not it's quantity. A couple of hours was all it took, then I was back to the real work rather than worrying about how to get the thread to do it's business.
EDIT actually Rob Kennedy does a great job explaining BeginThread here BeginThread Structure - Delphi
EDIT actually the way Rob Kennedy explains TThread in the same post, I think I'll change my code to use TThread tommorrow. Who knows what it will look like next week! (AsyncCalls maybe)

Achieving Thread-Safety

Question How can I make sure my application is thread-safe? Are their any common practices, testing methods, things to avoid, things to look for?
Background I'm currently developing a server application that performs a number of background tasks in different threads and communicates with clients using Indy (using another bunch of automatically generated threads for the communication). Since the application should be highly availabe, a program crash is a very bad thing and I want to make sure that the application is thread-safe. No matter what, from time to time I discover a piece of code that throws an exception that never occured before and in most cases I realize that it is some kind of synchronization bug, where I forgot to synchronize my objects properly. Hence my question concerning best practices, testing of thread-safety and things like that.
mghie: Thanks for the answer! I should perhaps be a little bit more precise. Just to be clear, I know about the principles of multithreading, I use synchronization (monitors) throughout my program and I know how to differentiate threading problems from other implementation problems. But nevertheless, I keep forgetting to add proper synchronization from time to time. Just to give an example, I used the RTL sort function in my code. Looked something like
FKeyList.Sort (CompareKeysFunc);
Turns out, that I had to synchronize FKeyList while sorting. It just don't came to my mind when initially writing that simple line of code. It's these thins I wanna talk about. What are the places where one easily forgets to add synchronization code? How do YOU make sure that you added sync code in all important places?
You can't really test for thread-safeness. All you can do is show that your code isn't thread-safe, but if you know how to do that you already know what to do in your program to fix that particular bug. It's the bugs you don't know that are the problem, and how would you write tests for those? Apart from that threading problems are much harder to find than other problems, as the act of debugging can already alter the behaviour of the program. Things will differ from one program run to the next, from one machine to the other. Number of CPUs and CPU cores, number and kind of programs running in parallel, exact order and timing of stuff happening in the program - all of this and much more will have influence on the program behaviour. [I actually wanted to add the phase of the moon and stuff like that to this list, but you get my meaning.]
My advice is to stop seeing this as an implementation problem, and start to look at this as a program design problem. You need to learn and read all that you can find about multi-threading, whether it is written for Delphi or not. In the end you need to understand the underlying principles and apply them properly in your programming. Primitives like critical sections, mutexes, conditions and threads are something the OS provides, and most languages only wrap them in their libraries (this ignores things like green threads as provided by for example Erlang, but it's a good point of view to start out from).
I'd say start with the Wikipedia article on threads and work your way through the linked articles. I have started with the book "Win32 Multithreaded Programming" by Aaron Cohen and Mike Woodring - it is out of print, but maybe you can find something similar.
Edit: Let me briefly follow up on your edited question. All access to data that is not read-only needs to be properly synchronized to be thread-safe, and sorting a list is not a read-only operation. So obviously one would need to add synchronization around all accesses to the list.
But with more and more cores in a system constant locking will limit the amount of work that can be done, so it is a good idea to look for a different way to design your program. One idea is to introduce as much read-only data as possible into your program - locking is no longer necessary, as all access is read-only.
I have found interfaces to be a very valuable aid in designing multi-threaded programs. Interfaces can be implemented to have only methods for read-only access to the internal data, and if you stick to them you can be quite sure that a lot of the potential programming errors do not occur. You can freely share them between threads, and the thread-safe reference counting will make sure that the implementing objects are properly freed when the last reference to them goes out of scope or is assigned another value.
What you do is create objects that descend from TInterfacedObject. They implement one or more interfaces which all provide only read-only access to the internals of the object, but they can also provide public methods that mutate the object state. When you create the object you keep both a variable of the object type and a interface pointer variable. That way lifetime management is easy, because the object will be deleted automatically when an exception occurs. You use the variable pointing to the object to call all methods necessary to properly set up the object. This mutates the internal state, but since this happens only in the active thread there is no potential for conflict. Once the object is properly set up you return the interface pointer to the calling code, and since there is no way to access the object afterwards except by going through the interface pointer you can be sure that only read-only access can be performed. By using this technique you can completely remove the locking inside of the object.
What if you need to change the state of the object? You don't, you create a new one by copying the data from the interface, and mutate the internal state of the new objects afterwards. Finally you return the reference pointer to the new object.
By using this you will only need locking where you get or set such interfaces. It can even be done without locking, by using the atomic interchange functions. See this blog post by Primoz Gabrijelcic for a similar use case where an interface pointer is set.
Simple: don't use shared data. Every time you access shared data you risk running into a problem (if you forget to synchronize access). Even worse, each time you access shared data you risk blocking other threads which will hurt your paralelization.
I know this advice is not always applicable. Still, it doesn't hurt if you try to follow it as much as possible.
EDIT: Longer response to Smasher's comment. Would not fit in a comment :(
You are totally correct. That's why I like to keep a shadow copy of the main data in a readonly thread. I add a versioning to the structure (one 4-aligned DWORD) and increment this version in the (lock-protected) data writer. Data reader would compare global and private version (which can be done without locking) and only if they differr it would lock the structure, duplicate it to a local storage, update the local version and unlock. Then it would access the local copy of the structure. Works great if reading is the primary way to access the structure.
I'll second mghie's advice: thread safety is designed in. Read about it anywhere you can.
For a really low level look at how it is implemented, look for a book on the internals of a real time operating system kernel. A good example is MicroC/OS-II: The Real Time Kernel by Jean J. Labrosse, which contains the complete annotated source code to a working kernel along with discussions of why things are done the way they are.
Edit: In light of the improved question focusing on using a RTL function...
Any object that can be seen by more than one thread is a potential synchronization issue. A thread-safe object would follow a consistent pattern in every method's implementation of locking "enough" of the object's state for the duration of the method, or perhaps, narrowed to just "long enough". It is certainly the case that any read-modify-write sequence to any part of an object's state must be done atomically with respect to other threads.
The art lies in figuring out how to get useful work done without either deadlocking or creating an execution bottleneck.
As for finding such problems, testing won't be any guarantee. A problem that shows up in testing can be fixed. But it is extremely difficult to write either unit tests or regression tests for thread safety... so faced with a body of existing code your likely recourse is constant code review until the practice of thread safety becomes second nature.
As folks have mentioned and I think you know, being certain, in general, that your code is thread safe is impossible (I believe provably impossible but I would have to track down the theorem). Naturally, you want to make things easier than that.
What I try to do is:
Use a known pattern of multithreaded design: A thread pool, the actor model paradigm, the command pattern or some such approach. This way, the syncronization process happens in the same way, in a uniform way, throughout the application.
Limit and concentrate the points of synchronization. Write your code so you need synchronization in as few places as possible and the keep the synchronization code in one or few places in the code.
Write the synchronization code so that the logical relation between the values is clear on both on entering and on exiting the guard. I use lots of asserts for this (your environment may limit this).
Don't ever access shared variables without guards/synchronization. Be very clear what your shared data is. (I've heard there are paradigms for guardless multithreaded programming but that would require even more research).
Write your code as cleanly, clearly and DRY-ly as possible.
My simple answer combined with those answer is:
Create your application/program using
thread safety manner
Avoid using public static variable in
all places
Therefore it usually fall into this habit/practice easily but it needs some time to get used to:
program your logic (not the UI) in functional programming language such as F# or even using Scheme or Haskell. Also functional programming promotes thread safety practice while it also warns us to always code towards purity in functional programming.
If you use F#, there's also clear distinction about using mutable or immutable objects such as variables.
Since method (or simply functions) is a first class citizen in F# and Haskell, then the code you write will also have more disciplined toward less mutable state.
Also using the lazy evaluation style that usually can be found in these functional languages, you can be sure that your program is safe fromside effects, and you'll also realize that if your code needs effects, you have to clearly define it. IF side effects are taken into considerations, then your code will be ready to take advantage of composability within components in your codes and the multicore programming.

Analyzing Multithreaded Programs [closed]

We have a codebase that is several years old, and all the original developers are long gone. It uses many, many threads, but with no apparent design or common architectural principles. Every developer had his own style of multithreaded programming, so some threads communicate with one another using queues, some lock data with mutexes, some lock with semaphores, some use operating-system IPC mechanisms for intra-process communications. There is no design documentation, and comments are sparse. It's a mess, and it seems that whenever we try to refactor the code or add new functionality, we introduce deadlocks or other problems.
So, does anyone know of any tools or techniques that would help to analyze and document all the interactions between threads? FWIW, the codebase is C++ on Linux, but I'd be interested to hear about tools for other environments.
I appreciate the responses received so far, but I was hoping for something more sophisticated or systematic than advice that is essentially "add log messages, figure out what's going on, and fix it." There are lots of tools out there for analyzing and documenting control-flow in single-threaded programs; is there nothing available for multi-threaded programs?
See also Debugging multithreaded applications
Invest in a copy of Intel's VTune and its thread profiling tools. It will give you both a system and a source level view of the thread behaviour. It's certainly not going to autodocument the thing for you, but should be a real help in at least visualising what is happening in different circumstances.
I think there is a trial version that you can download, so may be worth giving that a go. I've only used the Windows version, but looking at the VTune webpage it also has a Linux version.
As a starting point, I'd be tempted to add tracing log messages at strategic points within your application. This will allow you to analyse how your threads are interacting with no danger that the act of observing the threads will change their behaviour (as could be the case with step-by-step debugging).
My experience is with the .NET platform and my favoured logging tool would be log4net since it's free, has extensive configuration options and, if you're sensible in how you implement your logging, it won't noticeably hinder your application's performance. Alternatively, there is .NET's built in Debug (or Trace) class in the System.Diagnostics namespace.
I'd focus on the shared memory locks first (the mutexes and semaphores) as they are most likely to cause issues. Look at which state is being protected by locks and then determine which state is under the protection of several locks. This will give you a sense of potential conflicts. Look at situations where code that holds a lock calls out to methods (don't forget virtual methods). Try to eliminate these calls where possible (by reducing the time the lock is held).
Given the list of mutexes that are held and a rough idea of the state that they protect, assign a locking order (i.e., mutex A should always be taken before mutex B). Try to enforce this in the code.
See if you can combine several locks into one if concurrency won't be adversely affected. For example, if mutex A and B seem like they might have deadlocks and an ordering scheme is not easily done, combine them to one lock initially.
It's not going to be easy but I'm for simplifying the code at the expense of concurrency to get a handle of the problem.
This a really hard problem for automated tools. You might want to look into model checking your code. Don't expect magical results: model checkers are very limited in the amount of code and the number of threads they can effectively check.
A tool that might work for you is CHESS (although it is unfortunately Windows-only). BLAST is another fairly powerful tool, but is very difficult to use and may not handle C++. Wikipedia also lists StEAM, which I haven't heard of before, but sounds like it might work for you:
StEAM is a model checker for C++. It detects deadlocks, segmentation faults, out of range variables and non-terminating loops.
Alternatively, it would probably help a lot to try to converge the code towards a small number of well-defined (and, preferably, high-level) synchronization schemes. Mixing locks, semaphores, and monitors in the same code base is asking for trouble.
One thing to keep in mind with using log4net or similar tool is that they change the timing of the application and can often hide the underlying race conditions. We had some poorly written code to debug and introduced logging and this actually removed race conditions and deadlocks (or greatly reduced their frequency).
In Java, you have choices like FindBugs (for static bytecode analysis) to find certain kinds of inconsistent synchronization, or the many dynamic thread analyzers from companies like Coverity, JProbe, OptimizeIt, etc.
Can't UML help you here ?
If you reverse-engineer your codebase into UML, then you should be able to draw class diagrams that shows the relationships between your classes. Starting from the classes whose methods are the thread entry points, you could see which thread uses which class. Based on my experience with Rational Rose, this could be achieved using drag-and-drop ; if no relationship between the added class and the previous ones, then the added class is not directly used by the thread that started with the method you began the diagram with. This should gives you hints towards the role of each threads.
This will also show the "data objects" that are shared and the objects that are thread-specific.
If you draw a big class diagram and remove all the "data objects", then you should be able to layout that diagram as clouds, each clouds being a thread - or a group of threads, unless the coupling and cohesion of the code base is awful.
This will only gives you one portion of the puzzle, but it could be helpful ; I just hope your codebase is not too muddy or too "procedural", in which case ...
