How to automatically run in the background?

How to automatically run in the background? - multithreading

I'm not sure that it's not implemented yet, I hope that it is. But I know that in .Net programmers should manually run time-consuming task in the background thread.
So every time we handle some UI event and we understand that this will take some time we also understand that this will hang UI thread and our application. And then we make all this Background work things and handle callbacks or whatever.
So my question is:
Is there in some language/platform a mechanism that will automatically run time-consuming tasks in the background and will do all related work itself? So we just write the code for handling specific UI event and this code will be somehow detected as time-consuming and will be executed in background.
And if there isn't, then why?

There's been a lot of work done in Haskell (and other functional languages) to make it automatically do things in other threads. But Haskell's not the easiest GUI programming language.

You could look at something like Quartz for .Net. It's a job scheduler, but can be used to run time consuming processes in a background thread.

There is active research being done in this area, but it's a complex topic. For one example, see the Axum project by MS Research. It's a message-passing based DSL targetting the CLR.
I am not aware of any UI-specific languages, however. Most large frameworks (including .NET) have lots of tools for assisting the process of running tasks in the background.

There is not (as far as I know) and the reason for that is because the computer doesn't know ahead of time how long a certain task will take to complete. Make no mistake - a computer algorithm can be devised in particular cases by the programmer to determine the expected duration of a certain specific task - but at this time there is no way known to make a computer determine just how long any arbitrary task will take. This is a consequence of a very important Computer Science problem, called the Halting Problem.

Related

What is the best way to understand and analyze a multithreading code?

I'm not looking for programming techniques. My question is rather about what is the best way to understand a code developed by a third party.
I have a code for an application in a specific language (it could be C/C++, Java, etc.). This code uses several threads to control different processes. The application generates a log that shows all calls to relevant functions for each thread.
I have to analyze this code to understand its operation and be able to make an improvement of the algorithm. I worked little with threads, so I do not know which is the most convenient way to start the analysis and follow the execution of each thread.
Could you give me any recommendation?

If you are able to contact any of the code's original developers, having a conversation with them (by voice or by email) and asking them to describe how they intended things to work is always preferable to only trying to reverse-engineer their intent by looking at the code. If you can't contact the developers directly, then perhaps there is a library-specific developer's forum or other on-line resource where you can discuss the library's structure with people who have experience using/debugging it.
If that's not an option (or if you've done that and still don't feel like you understand things well enough), then I often find that profiling (either via a profiling tool, or just by temporarily putting printf() [or similar] tracing-calls into the codebase at various places and seeing what gets printed when) is a good way to find out which parts of the code are actually being used at which stages of the program's execution. That will help you confirm (or disprove) your theories about how the codebase works. Knowing where and when each thread is spawned, where its entry-function is, and where/when it gets joined again by its parent thread are particularly useful.
Finally, start looking at the various pieces of data (e.g. objects and member variables) each thread examines and/or modifies, and how accesses to each those pieces of data is synchronized/serialized. Assuming the code isn't buggy, the critical sections of the codebase are good indicators of where inter-thread communication is happening.

How Do I Choose Between the Various Ways to do Threading in Delphi?

It seems that I've finally got to implement some sort of threading into my Delphi 2009 program. If there were only one way to do it, I'd be off and running. But I see several possibilities.
Can anyone explain what's the difference between these and why I'd choose one over another.
The TThread class in Delphi
AsyncCalls by Andreas Hausladen
OmniThreadLibrary by Primoz Gabrijelcic (gabr)
... any others?
Edit:
I have just read an excellent article by Gabr in the March 2010 (No 10) issue of Blaise Pascal Magazine titled "Four Ways to Create a Thread". You do have to subscribe to gain content to the magazine, so by copyright, I can't reproduce anything substantial about it here.
In summary, Gabr describes the difference between using TThreads, direct Windows API calls, Andy's AsyncCalls, and his own OmniThreadLibrary. He does conclude at the end that:
"I'm not saying that you have to choose anything else than the classical Delphi way (TThread) but it is still good to be informed of options you have"
Mghie's answer is very thorough and suggests OmniThreadLibrary may be preferable. But I'm still interested in everyone's opinions about how I (or anyone) should choose their threading method for their application.
And you can add to the list:
. 4. Direct calls to the Windows API
. 5. Misha Charrett's CSI Distributed Application Framework as suggested by LachlanG in his answer.
Conclusion:
I'm probably going to go with OmniThreadLibrary. I like Gabr's work. I used his profiler GPProfile many years ago, and I'm currently using his GPStringHash which is actually part of OTL.
My only concern might be upgrading it to work with 64-bit or Unix/Mac processing once Embarcadero adds that functionality into Delphi.

If you are not experienced with multi-threading you should probably not start with TThread, as it is but a thin layer over native threading. I consider it also to be a little rough around the edges; it has not evolved a lot since the introduction with Delphi 2, mostly changes to allow for Linux compatibility in the Kylix time frame, and to correct the more obvious defects (like fixing the broken MREW class, and finally deprecating Suspend() and Resume() in the latest Delphi version).
Using a simple thread wrapper class basically also causes the developer to focus on a level that is much too low. To make proper use of multiple CPU cores a focus on tasks instead of threads is better, because the partitioning of work with threads does not adapt well to changing requirements and environments - depending on the hardware and the other software running in parallel the optimum number of threads may vary greatly, even at different times on the same system. A library that you pass only chunks of work to, and which schedules them automatically to make best use of the available resources helps a lot in this regard.
AsyncCalls is a good first step to introduce threads into an application. If you have several areas in your program where a number of time-consuming steps need to be performed that are independent of each other, then you can simply execute them asynchronously by passing each of them to AsyncCalls. Even when you have only one such time-consuming action you can execute it asynchronously and simply show a progress UI in the VCL thread, optionally allowing for cancelling the action.
AsyncCalls is IMO not so good for background workers that stay around during the whole program runtime, and it may be impossible to use when some of the objects in your program have thread affinity (like database connections or OLE objects that may have a requirement that all calls happen in the same thread).
What you also need to be aware of is that these asynchronous actions are not of the "fire-and-forget" kind. Every overloaded AsyncCall() function returns an IAsyncCall interface pointer that you may need to keep a reference to if you want to avoid blocking. If you don't keep a reference, then the moment the ref count reaches zero the interface will be freed, which will cause the thread releasing the interface to wait for the asynchronous call to complete. This is something that you might see while debugging, when exiting the method that created the IAsyncCall may take a mysterious amount of time.
OTL is in my opinion the most versatile of your three options, and I would use it without a second thought. It can do everything TThread and AsyncCalls can do, plus much more. It has a sound design, which is high-level enough both to make life for the user easy, and to let a port to a Unixy system (while keeping most of the interface intact) look at least possible, if not easy. In the last months it has also started to acquire some high-level constructs for parallel work, highly recommended.
OTL has a few dozen samples too, which is important to get started. AsyncCalls has nothing but a few lines in comments, but then it is easy enough to understand due to its limited functionality (it does only one thing, but it does it well). TThread has only one sample, which hasn't really changed in 14 years and is mostly an example of how not to do things.
Whichever of the options you choose, no library will eliminate the need to understand threading basics. Having read a good book on these is a prerequisite to any successful coding. Proper locking for example is a requirement with all of them.

There is another lesser known Delphi threading library, Misha Charrett's CSI Application Framework.
It's based around message passing rather than shared memory. The same message passing mechanism is used to communicate between threads running in the same process or in other processes so it's both a threading library and a distributed inter-process communication library.
There's a bit of a learning curve to get started but once you get going you don't have to worry about all the traditional threading issues such as deadlocks and synchronisation, the framework takes care of most of that for you.
Misha's been developing this for years and is still actively improving the framework and documentation all the time. He's always very responsive to support questions.

TThread is a simple class that encapsulates a Windows thread. You make a descendant class with an Execute method that contains the code this thread should execute, create the thread and set it to run and the code executes.
AsyncCalls and OmniThreadLibrary are both libraries that build a higher-level concept on top of threads. They're about tasks, discrete pieces of work that you need to have execute asynchronously. You start the library, it sets up a task pool, a group of special threads whose job is to wait around until you have work for them, and then you pass the library a function pointer (or method pointer or anonymous method) containing the code that needs to be executed, and it executes it in one of the task pool threads and handles a lot of the the low-level details for you.
I haven't used either library all that much, so I can't really give you a comparison between the two. Try them out and see what they can do, and which one feels better to you.

(sorry, I don't have enough points to comment so I'm putting this in as an answer rather than another vote for OTL)
I've used TThread, CSI and OmniThread (OTL). The two libraries both have non-trivial learning curves but are much more capable than TThread. My conclusion is that if you're going to do anything significant with threading you'll end up writing half of the library functionality anyway, so you might as well start with the working, debugged version someone else wrote. Both Misha and Gabr are better programmers than most of us, so odds are they've done a better job than we will.
I've looked at AsyncCalls but it didn't do enough of what I wanted. One thing it does have is a "Synchronize" function (missing from OTL) so if you're dependent on that you might go with AynscCalls purely for that. IMO using message passing is not hard enough to justify the nastiness of Synchronize, so buckle down and learn how to use messages.
Of the three I prefer OTL, largely because of the collection of examples but also because it's more self-contained. That's less of an issue if you're already using the JCL or you work in only one place, but I do a mix including contract work and selling clients on installing Misha's system is harder than the OTL, just because the OTL is ~20 files in one directory. That sounds silly, but it's important for many people.
With OTL the combination of searching the examples and source code for keywords, and asking questions in the forums works for me. I'm familiar with the traditional "offload CPU-intensive tasks" threading jobs, but right now I'm working on backgrounding a heap of database work which has much more "threads block waiting for DB" and less "CPU maxed out", and the OTL is working quite well for that. The main differences are that I can have 30+ threads running without the CPU maxing out, but stopping one is generally impossible.

I know this isn't the most advanced method :-) and maybe it has limitations too, but I just tried System.BeginThread and found it quite simple - probably because of the quality of the documentation I was referring to... http://www.delphibasics.co.uk/RTL.asp?Name=BeginThread (IMO Neil Moffatt could teach MSDN a thing or two)
That's the biggest factor I find in trying to learn new things, the quality of the documentation, not it's quantity. A couple of hours was all it took, then I was back to the real work rather than worrying about how to get the thread to do it's business.
EDIT actually Rob Kennedy does a great job explaining BeginThread here BeginThread Structure - Delphi
EDIT actually the way Rob Kennedy explains TThread in the same post, I think I'll change my code to use TThread tommorrow. Who knows what it will look like next week! (AsyncCalls maybe)

Are there any practical alternatives to threads?

While reading up on SQLite, I stumbled upon this quote in the FAQ: "Threads are evil. Avoid them."
I have a lot of respect for SQLite, so I couldn't just disregard this. I got thinking what else I could, according to the "avoid them" policy, use instead in order to parallelize my tasks. As an example, the application I'm currently working on requires a user interface that is always responsive, and needs to poll several websites from time to time (a process which takes at least 30 seconds for each website).
So I opened up the PDF linked from that FAQ, and essentially it seems that the paper suggests several techniques to be applied together with threads, such as barriers or transactional memory - rather than any techniques to replace threads altogether.
Given that these techniques do not fully dispense with threads (unless I misunderstood what the paper is saying), I can see two options: either the SQLite FAQ does not literally mean what it says, or there exist practical approaches that actually avoid the use of threads altogether. Are there any?
Just a quick note on tasklets/cooperative scheduling as an alternative - this looks great in small examples, but I wonder whether a large-ish UI-heavy application can be practically parallelized in a solely cooperative way. If you have done this successfully or know of such examples this certainly qualifies as a valid answer!

Note: This answer no longer accurately reflects what I think about this subject. I don't like its overly dramatic, somewhat nasty tone. Also, I am not so certain that the quest for provably correct software has been so useless as I seemed to think back then. I am leaving this answer up because it is accepted, and up-voted, and to edit it into something I currently believe would pretty much vandalize it.
I finally got around to reading the paper. Where do I start?
The author is singing an old song, which goes something like this: "If you can't prove the program is correct, we're all doomed!" It sounds best when screamed loudly accompanied by over modulated electric guitars and a rapid drum beat. Academics started singing that song when computer science was in the domain of mathematics, a world where if you don't have a proof, you don't have anything. Even after the first computer science department was cleaved from the mathematics department, they kept singing that song. They are singing that song today, and nobody is listening. Why? Because the rest of us are busy creating useful things, good things out of software that can't be proved correct.
The presence of threads makes it even more difficult to prove a program correct, but who cares? Even without threads, only the most trivial of programs can be proved correct. Why do I care if my non-trivial program, which could not be proved correct, is even more unprovable after I use threading? I don't.
If you weren't sure the author was living in an academic dreamworld, you can be sure of it after he maintains that the coordination language he suggests as an alternative to threads could best be expressed with a "visual syntax" (drawing graphs on the screen). I've never heard that suggestion before, except every year of my career. A language that can only be manipulated by GUI and does not play with any of the programmer's usual tools is not an improvement. The author goes on to cite UML as a shining example of a visual syntax which is "routinely combined with C++ and Java." Routinely in what world?
In the mean time, I and many other programmers go on using threads without all that much trouble. How to use threads well and safely is pretty much a solved problem, as long as you don't get all hung up on provability.
Look. Threading is a big kid's toy, and you do need to know some theory and usage patterns to use them well. Just as with databases, distributed processing, or any of the other beyond-grade-school devices that programmers successfully use every day. But just because you can't prove it correct doesn't mean it's wrong.

The statement in the SQLite FAQ, as I read it, is just a comment on how difficult threading can be to the uninitiated. It is the author's opinion, and it might be a valid one. But saying you should never use threads is throwing the baby out with the bath water, in my opinion. Threads are a tool. Like all tools, they can be used and they can be abused. I can read his paper and be convinced that threads are the devil, but I have used them successfully, without killing kittens.
Keep in mind that SQLite is written to be as lightweight and easy to understand (from a coding standpoint) as possible, so I would imagine that threading is kind of the antithesis to this lightweight approach.
Also, SQLite is not meant to be used in a highly-concurrent environment. If you have one of these, you might be better off working with a more enterprisey database like Postgres.

Evil, but a necessary evil. High level abstractions of threads (Tasks in .NET for example) are becoming more common but for the most part the industry is not trying to find a way to avoid threads, just making it easier to deal with the complexities that come with any kind of concurrent programming.

One trend I've noticed, at least in the Cocoa domain, is help from the framework. Apple has gone to great lengths to help developers with the relatively difficult concept of concurrent programming. Some things I've seen:
Different granularity of threading. Cocoa supports everything from posix threads (low level) to object oriented threading with NSLock and NSThread, to high level parellelism such as NSOperation. Depending on your task, using a high level tool like NSOperation is easier and gets the job done.
Threading behind the scenes via an API. Lots of the UI and animation stuff in cocoa is hidden behind an API. You are responsible for calling an API method and providing an asynchronous callback this executed when the secondary thread completes (for example the end of some animation).
openMP. There are tools like openMP that allow you to provide pragmas that describe to the compiler that some task may be safely parelellized. For example iterating a set of items in an independent way.
It seems like a big push in this industry is to make things simple for the Application developers and leave the gory thread details to the system developers and framework developers. There is a push in academia for formalizing parellel patterns. As mentioned you cant always avoid threading, but there are an increasing number of tools in your arsenal to make it as painless as possible.

If you really want to live without threads, you can, so long as you don't call any functions that can potentially block. This may not be possible.
One alternative is to implement the tasks you would have made into threads as finite state machines. Basically, the task does what it can do immediately, then goes to its next state, waiting for an event, such as input arriving on a file or a timer going off. X Windows, as well as most GUI toolkits, support this style. When something happens, they call a callback, which does what it needs to do and returns. For a FSM, the callback checks to see what state the task is in and what the event is to determine what to do immediately and what the next state will be.
Say you have an app that needs to accept socket connections, and for each connection, parse command lines, execute some code, and return the results. A task would then be what listens to a socket. When select() (or Gtk+, or whatever) tells you the socket has something to read, you read it into a buffer, then check to see if you have enough input buffered to do something. If so, you advance to a "start doing something" state, otherwise you stay in the "reading a line" state. (What you "do" could be multiple states.) When done, your task drops the line from the buffer and goes back to the "reading a line" state. No threads or preemption needed.
This lets you act multithreaded by way of being event-driven. If your state machines are complicated, however, your code can get hard to maintain pretty fast, and you'll need to work up some kind of FSM-management library to separate the grunt work of running the FSM from the code that actually does things.
P.S. Another way to get threads without really using threads is the GNU Pth library. It doesn't do preemption, but it is another option if you really don't want to deal with threads.

Another approach to this may be to use a different concurrency model rather than avoid multithreading altogether (you have to utilize all these CPU cores in parallel somehow).
Take a look at mechanisms used in Clojure (e.g. agents, software transactional memory).

Software Transactional Memory (STM) is a good alternative concurrency control. It scales well with multiple processors and do not have most of the problems of conventional concurrency control mechanisms. It is implemented as part of the Haskell language. It worths giving a try. Although, I do not know how this is applicable in the context of SQLite.

Alternatives to threads:
coroutines
goroutines
mapreduce
workerpool
apple's grand central dispatch+lambdas
openCL
erlang
(interesting to note that half of those technologies were invented or popularised by google.)
Another thing is many web frameworks transparently use multiple threads/processes for handling requests, and usually in such a way that mostly eliminates the problems associated with multithreading (for the user of the framework), or at least makes the threading rather invisible. The web being stateless, the only shared state is session state (which isn't really a problem since by definition, a single session isn't going to be doing concurrent things), and data in a database that already has its multithreading nonsense sorted out for you.
It's somewhat important to note though that these are all abstractions. The underlying implementations of these things still use threads. But this is still incredibly useful. In the same way you wouldn't use assembler to write a web application, you wouldn't use threads directly to write any important application. Designing an application to use threads is too complicated to leave for a human to deal with.

Threading is not the only model of concurrency. The actors model (Erlang, Scala) is an example of a somewhat different approach.
http://www.scala-lang.org/node/242

If your task is really, really easily isolatable, you can use processes instead of threads, like Chrome does for its tabs.
Otherwise, inside a single process, there is no way to achieve real parallelism without threads, because you need at least two coroutines if you want two things to happen at the same time (assuming you're having multiple processors/cores at hand, of course; otherwise real parallelism is simply not possible).
The complexity of threading a program is always relative to the degree of isolation of the tasks the threads will perform. There's no trouble in running several threads if you know for sure these will never use the same variables. Then again, multiple high-level constructs exist in modern languages to help synchronize access to shared resources.
It's really a matter of application. If your task is simple enough to fit in some kind of high-level Task object (depends on your development platform; your mileage may vary), then using a task queue is your best bet. My rule of the thumb is that if you can't find a cool name to your thread, then its task is not important enough to justify a thread (instead of task going on an operation queue).

Threads give you the opportunity to do some evil things, specifically sharing state among different execution paths. But they offer a lot of convenience; you don't have to do expensive communication across process boundaries. Plus, they come with less overhead. So I think they're perfectly fine, used correctly.
I think the key is to share as little data as possible among the threads; just stick to synchronization data. If you try to share more than that, you have to engage in complex code that is hard to get right the first time around.

One method of avoiding threads is multiplexing - in essence you make a lightweight mechanism similar to threads which you manage yourself.
Thing is this is not always viable. In your case the 30s polling time per website - can it be split into 60 0.5s pieces, in between which you can stuff calls to the UI? If not, sorry.
Threads aren't evil, they are just easy to shoot your foot with. If doing Query A takes 30s and then doing Query B takes another 30s, doing them simultaneously in threads will take 120s instead of 60 due to thread overhead, fighting for disk access and various bottlenecks.
But if Operation A consists of 5s of activity and 55 seconds of waiting, mixed randomly, and Operation B takes 60s of actual work, doing them in threads will take maybe 70s, compared to plain 120 when you execute them in sequence.
The rule of thumb is: threads should idle and wait most of the time. They are good for I/O, slow reads, low-priority work and so on. If you want performance, use multiplexing, which requires more work but is faster, more efficient and has way less caveats. (synchronizing threads and avoiding race conditions is a whole different chapter of thread headaches...)

Advice on starting a large multi-threaded programming project

My company currently runs a third-party simulation program (natural catastrophe risk modeling) that sucks up gigabytes of data off a disk and then crunches for several days to produce results. I will soon be asked to rewrite this as a multi-threaded app so that it runs in hours instead of days. I expect to have about 6 months to complete the conversion and will be working solo.
We have a 24-proc box to run this. I will have access to the source of the original program (written in C++ I think), but at this point I know very little about how it's designed.
I need advice on how to tackle this. I'm an experienced programmer (~ 30 years, currently working in C# 3.5) but have no multi-processor/multi-threaded experience. I'm willing and eager to learn a new language if appropriate. I'm looking for recommendations on languages, learning resources, books, architectural guidelines. etc.
Requirements: Windows OS. A commercial grade compiler with lots of support and good learning resources available. There is no need for a fancy GUI - it will probably run from a config file and put results into a SQL Server database.
Edit: The current app is C++ but I will almost certainly not be using that language for the re-write. I removed the C++ tag that someone added.

Numerical process simulations are typically run over a single discretised problem grid (for example, the surface of the Earth or clouds of gas and dust), which usually rules out simple task farming or concurrency approaches. This is because a grid divided over a set of processors representing an area of physical space is not a set of independent tasks. The grid cells at the edge of each subgrid need to be updated based on the values of grid cells stored on other processors, which are adjacent in logical space.
In high-performance computing, simulations are typically parallelised using either MPI or OpenMP. MPI is a message passing library with bindings for many languages, including C, C++, Fortran, Python, and C#. OpenMP is an API for shared-memory multiprocessing. In general, MPI is more difficult to code than OpenMP, and is much more invasive, but is also much more flexible. OpenMP requires a memory area shared between processors, so is not suited to many architectures. Hybrid schemes are also possible.
This type of programming has its own special challenges. As well as race conditions, deadlocks, livelocks, and all the other joys of concurrent programming, you need to consider the topology of your processor grid - how you choose to split your logical grid across your physical processors. This is important because your parallel speedup is a function of the amount of communication between your processors, which itself is a function of the total edge length of your decomposed grid. As you add more processors, this surface area increases, increasing the amount of communication overhead. Increasing the granularity will eventually become prohibitive.
The other important consideration is the proportion of the code which can be parallelised. Amdahl's law then dictates the maximum theoretically attainable speedup. You should be able to estimate this before you start writing any code.
Both of these facts will conspire to limit the maximum number of processors you can run on. The sweet spot may be considerably lower than you think.
I recommend the book High Performance Computing, if you can get hold of it. In particular, the chapter on performance benchmarking and tuning is priceless.
An excellent online overview of parallel computing, which covers the major issues, is this introduction from Lawerence Livermore National Laboratory.

Your biggest problem in a multithreaded project is that too much state is visible across threads - it is too easy to write code that reads / mutates data in an unsafe manner, especially in a multiprocessor environment where issues such as cache coherency, weakly consistent memory etc might come into play.
Debugging race conditions is distinctly unpleasant.
Approach your design as you would if, say, you were considering distributing your work across multiple machines on a network: that is, identify what tasks can happen in parallel, what the inputs to each task are, what the outputs of each task are, and what tasks must complete before a given task can begin. The point of the exercise is to ensure that each place where data becomes visible to another thread, and each place where a new thread is spawned, are carefully considered.
Once such an initial design is complete, there will be a clear division of ownership of data, and clear points at which ownership is taken / transferred; and so you will be in a very good position to take advantage of the possibilities that multithreading offers you - cheaply shared data, cheap synchronisation, lockless shared data structures - safely.

If you can split the workload up into non-dependent chunks of work (i.e., the data set can be processed in bits, there aren't lots of data dependencies), then I'd use a thread pool / task mechanism. Presumably whatever C# has as an equivalent to Java's java.util.concurrent. I'd create work units from the data, and wrap them in a task, and then throw the tasks at the thread pool.
Of course performance might be a necessity here. If you can keep the original processing code kernel as-is, then you can call it from within your C# application.
If the code has lots of data dependencies, it may be a lot harder to break up into threaded tasks, but you might be able to break it up into a pipeline of actions. This means thread 1 passes data to thread 2, which passes data to threads 3 through 8, which pass data onto thread 9, etc.
If the code has a lot of floating point mathematics, it might be worth looking at rewriting in OpenCL or CUDA, and running it on GPUs instead of CPUs.

For a 6 month project I'd say it definitely pays out to start reading a good book about the subject first. I would suggest Joe Duffy's Concurrent Programming on Windows. It's the most thorough book I know about the subject and it covers both .NET and native Win32 threading. I've written multithreaded programs for 10 years when I discovered this gem and still found things I didn't know in almost every chapter.
Also, "natural catastrophe risk modeling" sounds like a lot of math. Maybe you should have a look at Intel's IPP library: it provides primitives for many common low-level math and signal processing algorithms. It supports multi threading out of the box, which may make your task significantly easier.

There are a lot of techniques that can be used to deal with multithreading if you design the project for it.
The most general and universal is simply "avoid shared state". Whenever possible, copy resources between threads, rather than making them access the same shared copy.
If you're writing the low-level synchronization code yourself, you have to remember to make absolutely no assumptions. Both the compiler and CPU may reorder your code, creating race conditions or deadlocks where none would seem possible when reading the code. The only way to prevent this is with memory barriers. And remember that even the simplest operation may be subject to threading issues. Something as simple as ++i is typically not atomic, and if multiple threads access i, you'll get unpredictable results.
And of course, just because you've assigned a value to a variable, that's no guarantee that the new value will be visible to other threads. The compiler may defer actually writing it out to memory. Again, a memory barrier forces it to "flush" all pending memory I/O.
If I were you, I'd go with a higher level synchronization model than simple locks/mutexes/monitors/critical sections if possible. There are a few CSP libraries available for most languages and platforms, including .NET languages and native C++.
This usually makes race conditions and deadlocks trivial to detect and fix, and allows a ridiculous level of scalability. But there's a certain amount of overhead associated with this paradigm as well, so each thread might get less work done than it would with other techniques. It also requires the entire application to be structured specifically for this paradigm (so it's tricky to retrofit onto existing code, but since you're starting from scratch, it's less of an issue -- but it'll still be unfamiliar to you)
Another approach might be Transactional Memory. This is easier to fit into a traditional program structure, but also has some limitations, and I don't know of many production-quality libraries for it (STM.NET was recently released, and may be worth checking out. Intel has a C++ compiler with STM extensions built into the language as well)
But whichever approach you use, you'll have to think carefully about how to split the work up into independent tasks, and how to avoid cross-talk between threads. Any time two threads access the same variable, you have a potential bug. And any time two threads access the same variable or just another variable near the same address (for example, the next or previous element in an array), data will have to be exchanged between cores, forcing it to be flushed from CPU cache to memory, and then read into the other core's cache. Which can be a major performance hit.
Oh, and if you do write the application in C++, don't underestimate the language. You'll have to learn the language in detail before you'll be able to write robust code, much less robust threaded code.

One thing we've done in this situation that has worked really well for us is to break the work to be done into individual chunks and the actions on each chunk into different processors. Then we have chains of processors and data chunks can work through the chains independently. Each set of processors within the chain can run on multiple threads each and can process more or less data depending on their own performance relative to the other processors in the chain.
Also breaking up both the data and actions into smaller pieces makes the app much more maintainable and testable.

There's plenty of specific bits of individual advice that could be given here, and several people have done so already.
However nobody can tell you exactly how to make this all work for your specific requirements (which you don't even fully know yourself yet), so I'd strongly recommend you read up on HPC (High Performance Computing) for now to get the over-arching concepts clear and have a better idea which direction suits your needs the most.

The model you choose to use will be dictated by the structure of your data. Is your data tightly coupled or loosely coupled? If your simulation data is tightly coupled then you'll want to look at OpenMP or MPI (parallel computing). If your data is loosely coupled then a job pool is probably a better fit... possibly even a distributed computing approach could work.
My advice is get and read an introductory text to get familiar with the various models of concurrency/parallelism. Then look at your application's needs and decide which architecture you're going to need to use. After you know which architecture you need, then you can look at tools to assist you.
A fairly highly rated book which works as an introduction to the topic is "The Art of Concurrency: A Thread Monkey's Guide to Writing Parallel Application".

Read about Erlang and the "Actor Model" in particular. If you make all your data immutable, you will have a much easier time parallelizing it.

Most of the other answers offer good advice regarding partitioning the project - look for tasks that can be cleanly executed in parallel with very little data sharing required. Be aware of non-thread safe constructs such as static or global variables, or libraries that are not thread safe. The worst one we've encountered is the TNT library, which doesn't even allow thread-safe reads under some circumstances.
As with all optimisation, concentrate on the bottlenecks first, because threading adds a lot of complexity you want to avoid it where it isn't necessary.
You'll need a good grasp of the various threading primitives (mutexes, semaphores, critical sections, conditions, etc.) and the situations in which they are useful.
One thing I would add, if you're intending to stay with C++, is that we have had a lot of success using the boost.thread library. It supplies most of the required multi-threading primitives, although does lack a thread pool (and I would be wary of the unofficial "boost" thread pool one can locate via google, because it suffers from a number of deadlock issues).

I would consider doing this in .NET 4.0 since it has a lot of new support specifically targeted at making writing concurrent code easier. Its official release date is March 22, 2010, but it will probably RTM before then and you can start with the reasonably stable Beta 2 now.
You can either use C# that you're more familiar with or you can use managed C++.
At a high level, try to break up the program into System.Threading.Tasks.Task's which are individual units of work. In addition, I'd minimize use of shared state and consider using Parallel.For (or ForEach) and/or PLINQ where possible.
If you do this, a lot of the heavy lifting will be done for you in a very efficient way. It's the direction that Microsoft is going to increasingly support.
2: I would consider doing this in .NET 4.0 since it has a lot of new support specifically targeted at making writing concurrent code easier. Its official release date is March 22, 2010, but it will probably RTM before then and you can start with the reasonably stable Beta 2 now. At a high level, try to break up the program into System.Threading.Tasks.Task's which are individual units of work. In addition, I'd minimize use of shared state and consider using Parallel.For and/or PLINQ where possible. If you do this, a lot of the heavy lifting will be done for you in a very efficient way. 1: http://msdn.microsoft.com/en-us/library/dd321424%28VS.100%29.aspx

Sorry i just want to add a pessimistic or better realistic answer here.
You are under time pressure. 6 month deadline and you don't even know for sure what language is this system and what it does and how it is organized. If it is not a trivial calculation then it is a very bad start.
Most importantly: You say you have never done mulitithreading programming before. This is where i get 4 alarm clocks ringing at once. Multithreading is difficult and takes a long time to learn it when you want to do it right - and you need to do it right when you want to win a huge speed increase. Debugging is extremely nasty even with good tools like Total Views debugger or Intels VTune.
Then you say you want to rewrite the app in another lanugage - well this isn't as bad as you have to rewrite it anyway. THe chance to turn a single threaded Program into a well working multithreaded one without total redesign is almost zero.
But learning multithreading and a new language (what is your C++ skills?) with a timeline of 3 month (you have to write a throw away prototype - so i cut the timespan into two halfs) is extremely challenging.
My advise here is simple and will not like it: Learn multithreadings now - because it is a required skill set in the future - but leave this job to someone who already has experience. Well unless you don't care about the program being successfull and are just looking for 6 month payment.

If it's possible to have all the threads working on disjoint sets of process data, and have other information stored in the SQL database, you can quite easily do it in C++, and just spawn off new threads to work on their own parts using the Windows API. The SQL server will handle all the hard synchronization magic with its DB transactions! And of course C++ will perform a lot faster than C#.
You should definitely revise C++ for this task, and understand the C++ code, and look for efficiency bugs in the existing code as well as adding the multi-threaded functionality.

You've tagged this question as C++ but mentioned that you're a C# developer currently, so I'm not sure if you'll be tackling this assignment from C++ or C#. Anyway, in case you're going to be using C# or .NET (including C++/CLI): I have the following MSDN article bookmarked and would highly recommend reading through it as part of your prep work.
Calling Synchronous Methods Asynchronously

Whatever technology your going to write this, take a look a this must read book on concurrency "Concurrent programming in Java" and for .Net I highly recommend the retlang library for concurrent app.

I don't know if it was mentioned yet, but if I were in your shoes, what I would be doing right now (aside from reading every answer posted here) is writing a multiple threaded example application in your favorite (most used) language.
I don't have extensive multithreaded experience. I've played around with it in the past for fun but I think gaining some experience with a throw-away application will suit your future efforts.
I wish you luck in this endeavor and I must admit I wish I had the opportunity to work on something like this...

How to detect and debug multi-threading problems?

This is a follow up to this question, where I didn't get any input on this point. Here is the brief question:
Is it possible to detect and debug problems coming from multi-threaded code?
Often we have to tell our customers: "We can't reproduce the problem here, so we can't fix it. Please tell us the steps to reproduce the problem, then we'll fix it." It's a somehow nasty answer if I know that it is a multi-threading problem, but mostly I don't. How do I get to know that a problem is a multi-threading issue and how to debug it?
I'd like to know if there are any special logging frameworks, or debugging techniques, or code inspectors, or anything else to help solving such issues. General approaches are welcome. If any answer should be language related then keep it to .NET and Java.

Threading/concurrency problems are notoriously difficult to replicate - which is one of the reasons why you should design to avoid or at least minimize the probabilities. This is the reason immutable objects are so valuable. Try to isolate mutable objects to a single thread, and then carefully control the exchange of mutable objects between threads. Attempt to program with a design of object hand-over, rather than "shared" objects. For the latter, use fully synchronized control objects (which are easier to reason about), and avoid having a synchronized object utilize other objects which must also be synchronized - that is, try to keep them self contained. Your best defense is a good design.
Deadlocks are the easiest to debug, if you can get a stack trace when deadlocked. Given the trace, most of which do deadlock detection, it's easy to pinpoint the reason and then reason about the code as to why and how to fix it. With deadlocks, it always going to be a problem acquiring the same locks in different orders.
Live locks are harder - being able to observe the system while in the error state is your best bet there.
Race conditions tend to be extremely difficult to replicate, and are even harder to identify from manual code review. With these, the path I usually take, besides extensive testing to replicate, is to reason about the possibilities, and try to log information to prove or disprove theories. If you have direct evidence of state corruption you may be able to reason about the possible causes based on the corruption.
The more complex the system, the harder it is to find concurrency errors, and to reason about it's behavior. Make use of tools like JVisualVM and remote connect profilers - they can be a life saver if you can connect to a system in an error state and inspect the threads and objects.
Also, beware the differences in possible behavior which are dependent on the number of CPU cores, pipelines, bus bandwidth, etc. Changes in hardware can affect your ability to replicate the problem. Some problems will only show on single-core CPU's others only on multi-cores.
One last thing, try to use concurrency objects distributed with the system libraries - e.g in Java java.util.concurrent is your friend. Writing your own concurrency control objects is hard and fraught with danger; leave it to the experts, if you have a choice.

I thought that the answer you got to your other question was pretty good. But I'll emphasis these points.
Only modify shared state in a critical section (Mutual Exclusion)
Acquire locks in a set order and release them in the opposite order.
Use pre-built abstractions whenever possible (Like the stuff in java.util.concurrent)
Also, some analysis tools can detect some potential issues. For example, FindBugs can find some threading issues in Java programs. Such tools can't find all problems (they aren't silver bullets) but they can help.
As vanslly points out in a comment to this answer, studying well placed logging output can also very helpful, but beware of Heisenbugs.

For Java there is a verification tool called javapathfinder which I find it useful to debug and verify multi-threading application against potential race condition and death-lock bugs from the code.
It works finely with both Eclipse and Netbean IDE.
[2019] the github repository
https://github.com/javapathfinder

Assuming I have reports of troubles that are hard to reproduce I always find these by reading code, preferably pair-code-reading, so you can discuss threading semantics/locking needs. When we do this based on a reported problem, I find we always nail one or more problems fairly quickly. I think it's also a fairly cheap technique to solve hard problems.
Sorry for not being able to tell you to press ctrl+shift+f13, but I don't think there's anything like that available. But just thinking about what the reported issue actually is usually gives a fairly strong sense of direction in the code, so you don't have to start at main().

In addition to the other good answers you already got: Always test on a machine with at least as many processors / processor cores as the customer uses, or as there are active threads in your program. Otherwise some multithreading bugs may be hard to impossible to reproduce.

Apart from crash dumps, a technique is extensive run-time logging: where each thread logs what it's doing.
The first question when an error is reported, then, might be, "Where's the log file?"
Sometimes you can see the problem in the log file: "This thread is detecting an illegal/unexpected state here ... and look, this other thread was doing that, just before and/or just afterwards this."
If the log file doesn't say what's happening, then apologise to the customer, add sufficiently-many extra logging statements to the code, give the new code to the customer, and say that you'll fix it after it happens one more time.

Sometimes, multithreaded solutions cannot be avoided. If there is a bug,it needs to be investigated in real time, which is nearly impossible with most tools like Visual Studio. The only practical solution is to write traces, although the tracing itself should:
not add any delay
not use any locking
be multithreading safe
trace what happened in the correct sequence.
This sounds like an impossible task, but it can be easily achieved by writing the trace into memory. In C#, it would look something like this:
public const int MaxMessages = 0x100;
string[] messages = new string[MaxMessages];
int messagesIndex = -1;
public void Trace(string message) {
int thisIndex = Interlocked.Increment(ref messagesIndex);
messages[thisIndex] = message;
}
The method Trace() is multithreading safe, non blocking and can be called from any thread. On my PC, it takes about 2 microseconds to execute, which should be fast enough.
Add Trace() instructions wherever you think something might go wrong, let the program run, wait until the error happens, stop the trace and then investigate the trace for any errors.
A more detailed description for this approach which also collects thread and timing information, recycles the buffer and outputs the trace nicely you can find at:
CodeProject: Debugging multithreaded code in real time 1

A little chart with some debugging techniques to take in mind in debugging multithreaded code.
The chart is growing, please leave comments and tips to be added.
(update file at this link)

Visual Studio allows you to inspect the call stack of each thread, and you can switch between them. It is by no means enough to track all kinds of threading issues, but it is a start. A lot of improvements for multi-threaded debugging is planned for the upcoming VS2010.
I have used WinDbg + SoS for threading issues in .NET code. You can inspect locks (sync blokcs), thread call stacks etc.

Tess Ferrandez's blog has good examples of using WinDbg to debug deadlocks in .NET.

assert() is your friend for detecting race-conditions. Whenever you enter a critical section, assert that the invariant associated with it is true (that's what CS's are for). Though, unfortunately, the check might be expensive and thus not suitable for use in production environment.

I implemented the tool vmlens to detect race conditions in java programs during runtime. It implements an algorithm called eraser.

Develop code the way that Princess recommended for your other question (Immutable objects, and Erlang-style message passing). It will be easier to detect multi-threading problems, because the interactions between threads will be well defined.

I faced a thread issue which was giving SAME wrong result and was not behaving un-predictably since each time other conditions(memory, scheduler, processing load) were more or less same.
From my experience, I can say that HARDEST PART is to recognize that it is a thread issue, and BEST SOLUTION is to review the multi-threaded code carefully. Just by looking carefully at the thread code you should try to figure out what can go wrong. Other ways (thread dump, profiler etc) will come second to it.

Narrow down on the functions that are being called, and rule out what could and could not be to blame. When you find sections of code that you suspect may be causing the issue, add lots of detailed logging / tracing to it. Once the issue occurs again, inspect the logs to see how the code executed differently than it does in "baseline" situations.
If you are using Visual Studio, you can also set breakpoints and use the Parallel Stacks window. Parallel Stacks is a huge help when debugging concurrent code, and will give you the ability to switch between threads to debug them independently. More info-
https://learn.microsoft.com/en-us/visualstudio/debugger/using-the-parallel-stacks-window?view=vs-2019
https://learn.microsoft.com/en-us/visualstudio/debugger/walkthrough-debugging-a-parallel-application?view=vs-2019

I'm using GNU and use simple script
$ more gdb_tracer
b func.cpp:2871
r
#c
while (1)
next
#step
end

The best thing I can think of is to stay away from multi-threaded code whenever possible. It seems there are very few programmers who can write bug free multi threaded applications and I would argue that there are no coders beeing able to write bug free large multi threaded applications.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string