My understanding is that threads exist as a way of doing several things in parallel that share the same address space but each has its individual stack. Asynchronous programming is basically a way of using fewer threads. I don't understand why it's undesirable to just have blocking calls and a separate thread for each blocked command?
For example, suppose I want to scrape a large part of the web. A presumably uncontroversial implementation might be to have a large number of asynchronous loops. Each loop would ask for a webpage, wait to avoid overloading the remote server, then ask for another webpage from the same website until done. The loops would then be executed on a much smaller number of threads (which is fine because they mostly wait). So to restate the question, I don't see why it's any cheaper to e.g. maintain a threadpool within the language runtime than it would be to just have one (mostly blocked) OS thread per loop and let the operating system deal with the complexity of scheduling the work? After all, if piling two different schedulers on top of each other is so good, it can still be implemented that way in the OS :-)
It seems obvious the answer is something like "threads are expensive". However, a thread just needs to keep track of where it has got to before it was interrupted the last time. This is pretty much exactly what an asynchronous command needs to know before it blocks (perhaps representing what happens next by storing a callback). I suppose one difference is that a blocking asynchronous command does so at a well defined point whereas a thread can be interrupted anywhere. If there really is a substantial difference in terms of the cost of keeping the state, where does it come from? I doubt it's the cost of the stack since that wastes at most a 4KB page, so it's a non-issue even for 1000s of blocked threads.
Threads consume memory, as they need to have their state preserved even if they aren't doing anything. If you make an asynchronous call on a single thread, it's literally (aside from registering that the call was made somewhere) not consuming any resources until it needs to be resolved, because if it isn't actively being processed you don't care about it.
Why not to use massively multi-threaded code?

Asynchronous and other event-based programming paradigms seem to be spreading like wildfire these days, with the popularity of node.js, Python 3.5's recent async improvements, and what not else.
Not that I particularly mind this or that I haven't already been doing it for a long time myself, but I've been trying to wrap my head around the real reasons why. Searching around for the evils of synchronous programming consistently seems to net the preconceived notion that "you can't have a thread for each request", without really qualifying that statement.
Why not, though? A thread might not be the cheapest resource one could think of, but it hardly seems "expensive". On 64-bit machines, we have more than enough virtual address space to handle all the threads we could ever want, and, unless your call chains are fairly deep, each thread shouldn't necessarily have to require more physical RAM than a single page* for stack plus whatever little overhead the kernel and libc need. As for performance, my own casual testing shows that Linux can handle well over 100,000 thread creations and tear-downs per second on a single CPU, which can hardly be a bottleneck.
That being said, it's not like I think event-based programming is all just a ruse, seeing as how it seems to have been the primary driver allowing such HTTP servers as lighttpd/nginx/whatever to overtake Apache in highly concurrent performance**. However, I've been trying to find some kind of actual inquiry into the reason why massively-multithreaded programs are slower without being able to find any.
So then, why is this?
*My testing seems to show that each thread actually requires two pages. Perhaps there's some dirtying of the TLS going on or something, but nevertheless it doesn't seem to change a lot.
**Though it should also be said that Apache, at that time, was using process-based concurrency rather than thread-based, which obviously makes a lot of difference.
If you have a thread for each request, then you can't do a little bit of work for each of 100 requests without switching contexts 100 times. While many things computers have to do have gotten faster over time, context switching is still expensive because it blows out the caches and modern systems are more dependent on these caches than ever.
That is a loaded question. I've heard different responses over time because I've had that conversation so many times before with different developers. Mainly, my gut feeling is most developers hate it because it is harder to write multi-threaded code and sometimes it is easy to shoot yourself in the foot unnecessarily. That said, each situation is different. Some programs lend themselves to multi-threading rather nicely, like a webserver. Each thread can take a request and essentially processes it without needing much outside resources. It has a set of procedures to apply on a request to decide how to process it. It decides what to do with it and passes it off. So it is fairly independent and can operate in its own world fairly safely. So it is a nice thread.
Other situations might not lend themselves so nicely. Especially when you need shared resources. Things can get hairy fast. Even if you do what seems like perfect context switching, you might still get race conditions. Then the nightmares begin. This is seen quite often in huge monolithic applications where they opted to use threads and open the gates of hell upon their dev team.
In the end, I think we will probably not see more threading in the day-to-day development, but we will move to a more event driven like world. We are going down that route with web development with the emergence of micro-services. So there will probably be more threading used, but not in a way that is visible to the developer using the framework. It will just be apart of the framework. At least that is my opinion.
Once the number of ready or running threads (versus threads pending on events) and/or processes goes beyond the number of cores, then those threads and/or processes are competing for the same cores, same cache, and the same memory bus.
Given a single core CPU, what is the benefit to coding using threads?
At least with the Java implementation, and it seems intuitive to naturally extend to any other language considering the single core restriction, you may have several threads performing various actions but the processes are time-limited and switched.
Given process A and process B:
What is the benefit of performing half of process A, finish process B, and then finish the second half of process A VS performing process A then B?
It seems that the switching between the threads would introduce time delays that would prolong the overall completion time of both processes VS not switching and just completing A then B.
The reason to use threads on a single-core system is simply to allow processes that would otherwise use all the CPU to be preempted by other tasks that need to get done sooner. The most common reason to make a system multi-threaded is to have a responsive user interface even while performing long calculations.
Of course, any operation can take a long time (reading a file, accessing a database, resizing a photo, recalculating a spreadsheet), and those operations can be performed on a separate thread to allow the thread responding to user input to operate the whole time.
Twenty years ago, for example, it was rare to have a multi-CPU system or an OS that allowed multi-threading, so nearly every program was single-threaded and there were many frameworks created to allow systems to have UIs and still do I/O. The standard mechanism for this is an event loop, where all events (UI, network, timers, etc.) are processed in a big loop.
This type of system means that the UI is held up during things like file I/O and calculations. In order to not hold up the UI too much, you have to do the I/O in chunks (say, read the file 4k at a time), processing any incoming UI events between chunks. This is really just a hack to keep the system running, but it's hard to make the system run smoothly like this because you don't know how often you need to process events.
The solution is to have a separate thread to recalculate your spreadsheet or write your file. That way the OS can give those threads fair timeslices while still preempting them to run the UI, allowing the UI to always be responsive.
An executing thread is not necessarily doing anything useful. The canonical example is reading from disk -- that data isn't going to be there for another few milliseconds, during which time the processor would be sitting unused. Threads allow one piece of the program to use the CPU while other pieces of the program are waiting for operations to complete.
There are many reasons. Wikipedia gives a decent overview on its page about threads.
Here's a few OTOH:
I/O bound tasks benefit from threading (especially network applications).
Hyperthreaded processors may speed up multithreaded applications even on a single core.
Threads can be instructed to wait (block) and wake up on specific events, enabling responsive event-driven programming.
If your program has to do several things "at the same time" then threads are a good way to go, particularly is some of those tasks are quite long running. Otherwise you find yourself writing code that looks like an operating system scheduler inside your program, which is always a waste of time if the OS underneath you has a perfectly good one already. You'd find that your source code was mostly 'scheduler' and not much 'program', which is very inelegant. A good threaded program can be very elegant and economic in source code, which makes oneself look good and saves time.
Some run times get/got it wrong. In the early days of Ada the runtime environment would do its own thread scheduling, and it was never very satisfactory. That was partly due to the fact that whilst the Ada language spec included the concept of threads, the OSes we had back then quite often didn't provide them. Ada got a lot better when the compiler writers started using the underlying OS threads instead.
Similarly Python doesn't really properly use the underlying OS threads; it spoils it with the Global Interpreter Lock. Python has sidestepped the whole issue by going for multiprocessing instead (not necessarily a good thing on Windows hosts...).
Early versions of Windows didn't do threads either, they did cooperative multitasking. This depended on each process in the whole machine calling any OS routine at least now and then. Each OS routine would first consult the 'scheduler' to see if anything else was waiting to run before getting on with whatever it was supposed to be doing on behalf of the program. There were many terrible programs back then that wouldn't play ball and hogged the entire machine. You couldn't get on with playing a game of Solitaire when something else embarked on a length calculation.
What's the mental model of your program?
IF it depends on multiple external inputs that can happen in unpredictable orders, and if what you want to do in response to those inputs is not simple and can overlap in time ...
THEN it makes sense to devote a separate thread to each input request, and have that thread perform the response needed by that request.
So, for example, if your program is waiting for input requests from an external channel, and each request must trigger its own protocol of outgoing and incoming messages, it can very much simplify the code to create a new thread (or re-use an old one) for each request.
Somehow people seem to enter the workforce thinking that threads are only there for speed (through parallelism).
That's one use, provided it allows multiple CPU chips to get cranking,
How to articulate the difference between asynchronous and parallel programming?

Many platforms promote asynchrony and parallelism as means for improving responsiveness. I understand the difference generally, but often find it difficult to articulate in my own mind, as well as for others.
I am a workaday programmer and use async & callbacks fairly often. Parallelism feels exotic.
But I feel like they are easily conflated, especially at the language design level. Would love a clear description of how they relate (or don't), and the classes of programs where each is best applied.
When you run something asynchronously it means it is non-blocking, you execute it without waiting for it to complete and carry on with other things. Parallelism means to run multiple things at the same time, in parallel. Parallelism works well when you can separate tasks into independent pieces of work.
Take for example rendering frames of a 3D animation. To render the animation takes a long time so if you were to launch that render from within your animation editing software you would make sure it was running asynchronously so it didn't lock up your UI and you could continue doing other things. Now, each frame of that animation can also be considered as an individual task. If we have multiple CPUs/Cores or multiple machines available, we can render multiple frames in parallel to speed up the overall workload.
I believe the main distinction is between concurrency and parallelism.
Async and Callbacks are generally a way (tool or mechanism) to express concurrency i.e. a set of entities possibly talking to each other and sharing resources.
In the case of async or callback communication is implicit while sharing of resources is optional (consider RMI where results are computed in a remote machine).
As correctly noted this is usually done with responsiveness in mind; to not wait for long latency events.
Parallel programming has usually throughput as the main objective while latency, i.e. the completion time for a single element, might be worse than a equivalent sequential program.
To better understand the distinction between concurrency and parallelism I am going to quote from Probabilistic models for concurrency of Daniele Varacca which is a good set of notes for theory of concurrency:
A model of computation is a model for concurrency when it is able to represent systems as composed of independent autonomous components, possibly communicating with each other. The notion of concurrency should not be confused with the notion of parallelism. Parallel computations usually involve a central control which distributes the work among several processors. In concurrency we stress the independence of the components, and the fact that they communicate with each other. Parallelism is like ancient Egypt, where the Pharaoh decides and the slaves work. Concurrency is like modern Italy, where everybody does what they want, and all use mobile phones.
In conclusion, parallel programming is somewhat a special case of concurrency where separate entities collaborate to obtain high performance and throughput (generally).
Async and Callbacks are just a mechanism that allows the programmer to express concurrency.
Consider that well-known parallel programming design patterns such as master/worker or map/reduce are implemented by frameworks that use such lower level mechanisms (async) to implement more complex centralized interactions.
It has this about asynchronous programming:
Asynchronous calls are used to prevent “blocking” within an application. [Such a] call will spin-off in an already existing thread (such as an I/O thread) and do its task when it can.
this about parallel programming:
In parallel programming you still break up work or tasks, but the key differences is that you spin up new threads for each chunk of work
and this in summary:
asynchronous calls will use threads already in use by the system and parallel programming requires the developer to break the work up, spinup, and teardown threads needed.
async: Do this by yourself somewhere else and notify me when you complete(callback). By the time i can continue to do my thing.
parallel: Hire as many guys(threads) as you wish and split the job to them to complete quicker and let me know(callback) when you complete. By the time i might continue to do my other stuff.
the main difference is parallelism mostly depends on hardware.
My basic understanding is:
Asynchonous programming solves the problem of waiting around for an expensive operation to complete before you can do anything else. If you can get other stuff done while you're waiting for the operation to complete then that's a good thing. Example: keeping a UI running while you go and retrieve more data from a web service.
Parallel programming is related but is more concerned with breaking a large task into smaller chunks that can be computed at the same time. The results of the smaller chunks can then be combined to produce the overall result. Example: ray-tracing where the colour of individual pixels is essentially independent.
It's probably more complicated than that, but I think that's the basic distinction.
I tend to think of the difference in these terms:
Asynchronous: Go away and do this task, when you're finished come back and tell me and bring the results. I'll be getting on with other things in the mean time.
Parallel: I want you to do this task. If it makes it easier, get some folks in to help. This is urgent though, so I'll wait here until you come back with the results. I can do nothing else until you come back.
Of course an asynchronous task might make use of parallelism, but the differentiation - to my mind at least - is whether you get on with other things while the operation is being carried out or if you stop everything completely until the results are in.
It is a question of order of execution.
If A is asynchronous with B, then I cannot predict beforehand when subparts of A will happen with respect to subparts of B.
If A is parallel with B, then things in A are happening at the same time as things in B. However, an order of execution may still be defined.
Perhaps the difficulty is that the word asynchronous is equivocal.
I execute an asynchronous task when I tell my butler to run to the store for more wine and cheese, and then forget about him and work on my novel until he knocks on the study door again. Parallelism is happening here, but the butler and I are engaged in fundamentally different tasks and of different social classes, so we don't apply that label here.
My team of maids is working in parallel when each of them is washing a different window.
My race car support team is asynchronously parallel in that each team works on a different tire and they don't need to communicate with each other or manage shared resources while they do their job.
My football (aka soccer) team does parallel work as each player independently processes information about the field and moves about on it, but they are not fully asynchronous because they must communicate and respond to the communication of others.
My marching band is also parallel as each player reads music and controls their instrument, but they are highly synchronous: they play and march in time to each other.
A cammed gatling gun could be considered parallel, but everything is 100% synchronous, so it is as though one process is moving forward.
Why Asynchronous ?
With today's application's growing more and more connected and also potentially
long running tasks or blocking operations such as Network I/O or Database Operations.So it's very important to hide the latency of these operations by starting them in background and returning back to the user interface quickly as possible. Here Asynchronous come in to the picture, Responsiveness.
Why parallel programming?
With today's data sets growing larger and computations growing more complex. So it's very important to reduce the execution time of these CPU-bound operations, in this case, by dividing the workload into chunks and then executing those chunks simultaneously. We can call this as "Parallel" .
Obviously it will give high Performance to our application.
Let's say you are the point of contact for your client and you need to be responsive i.e. you need to share status, complexity of operation, resources required etc whenever asked. Now you have a time-consuming operation to be done and hence cannot take this up as you need to be responsive to the client 24/7. Hence, you delegate the time-consuming operation to someone else so that you can be responsive. This is asynchronous.
Parallel programming
Let's say you have a task to read, say, 100 lines from a text file, and reading one line takes 1 second. Hence, you'll require 100 seconds to read the text file. Now you're worried that the client must wait for 100 seconds for the operation to finish. Hence you create 9 more clones and make each of them read 10 lines from the text file. Now the time taken is only 10 seconds to read 100 lines. Hence you have better performance.
To sum up, asynchronous coding is done to achieve responsiveness and parallel programming is done for performance.
Asynchronous: Running a method or task in background, without blocking. May not necessorily run on a separate thread. Uses Context Switching / time scheduling.
Parallel Tasks: Each task runs parallally. Does not use context switching / time scheduling.
I came here fairly comfortable with the two concepts, but with something not clear to me about them.
After reading through some of the answers, I think I have a correct and helpful metaphor to describe the difference.
If you think of your individual lines of code as separate but ordered playing cards (stop me if I am explaining how old-school punch cards work), then for each separate procedure written, you will have a unique stack of cards (don't copy & paste!) and the difference between what normally goes on when run code normally and asynchronously depends on whether you care or not.
When you run the code, you hand the OS a set of single operations (that your compiler or interpreter broke your "higher" level code into) to be passed to the processor. With one processor, only one line of code can be executed at any one time. So, in order to accomplish the illusion of running multiple processes at the same time, the OS uses a technique in which it sends the processor only a few lines from a given process at a time, switching between all the processes according to how it sees fit. The result is multiple processes showing progress to the end user at what seems to be the same time.
For our metaphor, the relationship is that the OS always shuffles the cards before sending them to the processor. If your stack of cards doesn't depend on another stack, you don't notice that your stack stopped getting selected from while another stack became active. So if you don't care, it doesn't matter.
However, if you do care (e.g., there are multiple processes - or stacks of cards - that do depend on each other), then the OS's shuffling will screw up your results.
Writing asynchronous code requires handling the dependencies between the order of execution regardless of what that ordering ends up being. This is why constructs like "call-backs" are used. They say to the processor, "the next thing to do is tell the other stack what we did". By using such tools, you can be assured that the other stack gets notified before it allows the OS to run any more of its instructions. ("If called_back == false: send(no_operation)" - not sure if this is actually how it is implemented, but logically, I think it is consistent.)
For parallel processes, the difference is that you have two stacks that don't care about each other and two workers to process them. At the end of the day, you may need to combine the results from the two stacks, which would then be a matter of synchronicity but, for execution, you don't care again.
Not sure if this helps but, I always find multiple explanations helpful. Also, note that asynchronous execution is not constrained to an individual computer and its processors. Generally speaking, it deals with time, or (even more generally speaking) an order of events. So if you send dependent stack A to network node X and its coupled stack B to Y, the correct asynchronous code should be able to account for the situation as if it was running locally on your laptop.
Generally, there are only two ways you can do more than one thing each time. One is asynchronous, the other is parallel.
From the high level, like the popular server NGINX and famous Python library Tornado, they both fully utilize asynchronous paradigm which is Single thread server could simultaneously serve thousands of clients (some IOloop and callback). Using ECF(exception control follow) which could implement the asynchronous programming paradigm. so asynchronous sometimes doesn't really do thing simultaneous, but some io bound work, asynchronous could really promotes the performance.
The parallel paradigm always refers multi-threading, and multiprocessing. This can fully utilize multi-core processors, do things really simultaneously.
Summary of all above answers
parallel computing:
▪ solves throughput issue.
Concerned with breaking a large task into smaller chunks
▪ is machine related (multi machine/core/cpu/processor needed), eg: master slave, map reduce.
Parallel computations usually involve a central control which distributes the work among several processors
▪ solves latency issue
ie, the problem of 'waiting around' for an expensive operation to complete before you can do anything else
▪ is thread related (multi thread needed)
Is firing off a Thread a valid answer to simplifying code?

As multi-processor and multi-core computers become more and more ubiquitous, is simply firing off a new thread a (relatively) simple and painless way of simplifying code? For instance, in a current personal project, I have a network server listening on a port. Since this is just a personal project, it's just a desktop app, with a GUI integrated into it for configuration. So, the app reads something like this:
Read configuration
Start listener thread
Listener Thread
While the app is running
Wait for a new connection
Run a client thread for the new connection
Client Thread
Write synchronously
Read synchronously
ad inifinitum, or till they disconnect
This approach means that while I have to worry about alot of locking, with the potential issues that involves, I avoid alot of spaghetti code from assynchronous calls, etc.
A slightly more insidious version of this came up today when I was working on the startup code. The startup was quick, but it was using lazy loading for alot of the configuration, which meant that while startup was quick, actually connecting to and using the service was difficult because of the lag while it loaded different sections (this was actually measurable in real time, up to 3-10 seconds sometimes). So I moved to a different strategy, on startup, loop through everything and force the lazy loading to kick in... but this made it start prohibitively slow; get up, go get a coffee slow. Final solution: throw the loop into a seperate thread with feedback in the system tray while it's still loading.
Is this "Meh, throw it in another thread, it'll be fine" attitude ok? At what point do you start getting diminishing returns and/or even reduced performance?
Multithreading does a lot of things, but I don't think "simplification" is ever one of them.
It's a great way to introduce bugs into code.
Using multiple threads properly is not easy. It should not be attempted by new developers.
In my opinion, multi-threaded programming is pretty high up on the difficulty (and complexity) scale, along with memory management. To me, the "Meh, throw it in another thread, it'll be fine" attitude is a bit too casual. Think long and hard you must, before forking threads you do.
Plainly and simply, multithreading increases complexity and is a nearly trivial way to add bugs to code. There are concurrency issues such as synchronization, deadlock, race conditions, and priority inversion to name a few.
Secondly, the performance gains are not automatic. Recently, there was an excellent article in MSDN Magazine along these lines. The salient details are that a certain operation was taking 46 seconds per ten iterations coded as a single-threaded operation. The author parallelized the operation naively (one thread per four cores) and the operation dropped to 30 seconds per ten iterations. Sounds great until you take into consideration that the operation now eats 300% more processing power but only experienced a 34% gain in efficiency. It's not worth consuming all available processing power for a gain like that.
This gives you the extra job of debugging race conditions, and handling locks and sycronisation issues.
I would not use this unless there was a real need.
Read up on Amdahl's law, best summarized by "The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program."
As it turns out, if only a small part of your app can run in parallel you won't get much gains, but potentially many hard-to-debug bugs.
I don't mean to be flip but what's in that configuration file that it takes so long to load? That's the origin of your problem, right?
Before spawning another thread to handle it, perhaps it can be parred down? Reduced, perhaps put in another data format that would be quicker, etc?
How often does it change? Is it something you can parse once at the beginning of the day and put the variables in shared memory so subsequent runs of your main program can just attach and get the needed values from there?
While I agree with everyone else here in saying that multithreading does not simplify code, it can be used to greatly simplify the user experience of your application.
Consider an application that has a lot of interactive widgets (I am currently developing one where this helps) - in the workflow of my application, a user can "build" the current project they are working on. This requires disabling the interactive widgets my application presents to the user and presenting a dialog with a indeterminate progress bar and a friendly "please wait" message.
The "build" occurs on a background thread; if it were to happen on the UI thread it would make the user experience less enjoyable - after all, it's no fun not being able to tell whether or not you are able to click on a widget in an application while a background task is running (cough, Visual Studio). Not to say that VS doesn't use background threads, I'm just saying their user experience could use some improvement. But I digress.
The one thing I take issue with in the title of your post is that you think of firing off threads when you need to perform tasks - I generally prefer to reuse threads - in .NET, I generally favor using the system thread pool over creating a new thread each time I want to do something, for the sake of performance.
I'm going to provide some balance against the unanimous "no".
DISCLAIMER: Yes, threads are complicated and can cause a whole bunch of problems. Everyone else has pointed this out.
From experience, a sequence of blocking reads/writes to a socket (which requires a separate thead) is much simpler than non-blocking ones. With blocking calls, you can tell the state of the connection just by looking at where you are in the function. With non-blocking calls, you need a bunch of variables to record the state of the connection, and check and modify them every time you interact with the connection. With blocking calls, you can just say "read the next X bytes" or "read until you find X" and it will actually do it (or fail). With non-blocking calls, you have to deal with fragmented data which usually requires keeping temporary buffers and filling them as necessary. You also end up checking if you've received enough data every time you receive little more. Plus you have to keep a list of open connections and handle unexpected closes for all of them.
It doesn't get much simpler than this:
void WorkerThreadMain(Connection connection) {
Request request = ReadRequest(connection);
if(!request) return;
Reply reply = ProcessRequest(request);
if(!connection.isOpen) return;
SendReply(reply, connection);
I'd like to note that this "listener spawns off a worker thread per connection" pattern is how web servers are designed, and I assume it's how a lot of request/response soft of server applications are designed.
So in conclusion, I have experienced the asynchronous socket spaghetti code you mentioned, and spawning off worker threads for every connection ended up being a good solution. Having said all this, throwing threads at a problem should usually be your last resort.
Threads or asynch?

How do you make your application multithreaded ?
Do you use asynch functions ?
or do you spawn a new thread ?
I think that asynch functions are already spawning a thread so if your job is doing just some file reading, being lazy and just spawning your job on a thread would just "waste" ressources...
So is there some kind of design when using thread or asynch functions ?
If you are talking about .Net, then don't forget the ThreadPool. The thread pool is also what asynch functions often use. Spawning to much threads can actually hurt your performance. A thread pool is designed to spawn just enough threads to do the work the fastest. So do use a thread pool instead of spwaning your own threads, unless the thread pool doesn't meet your needs.
PS: And keep an eye out on the Parallel Extensions from Microsoft
Spawning threads is only going to waste resources if you start spawning tons of them, one or two extra threads isn't going to effect the platforms proformance, infact System currently has over 70 threads for me, and msn is using 32 (I really have no idea how a messenger can use that many threads, exspecialy when its minimised and not really doing anything...)
Useualy a good time to spawn a thread is when something will take a long time, but you need to keep doing something else.
eg say a calculation will take 30 seconds. The best thing to do is spawn a new thread for the calculation, so that you can continue to update the screen, and handle any user input because users will hate it if your app freezes untill its finished doing the calculation.
On the other hand, creating threads to do something that can be done almost instantly is nearly pointless, since the overhead of creating (or even just passing work to an existing thread using a thread pool) will be higher than just doing the job in the first place.
Sometimes you can break your app into a couple of seprate parts which run in their own threads. For example in games the updates/physics etc may be one thread, while grahpics are another, sound/music is a third, and networking is another. The problem here is you really have to think about how these parts will interact or else you may have worse proformance, bugs that happen seemingly "randomly", or it may even deadlock.
I'll second Fire Lancer's answer - creating your own threads is an excellent way to process big tasks or to handle a task that would otherwise be "blocking" to the rest of synchronous app, but you have to have a clear understanding of the problem that you must solve and develope in a way that clearly defines the task of a thread, and limits the scope of what it does.
For an example I recently worked on - a Java console app runs periodically to capture data by essentially screen-scraping urls, parsing the document with DOM, extracting data and storing it in a database.
As a single threaded application, it, as you would expect, took an age, averaging around 1 url a second for a 50kb page. Not too bad, but when you scale out to needing to processes thousands of urls in a batch, it's no good.
Profiling the app showed that most of the time the active thread was idle - it was waiting for I/O operations - opening of a socket to the remote URL, opening a connection to the database etc. It's this sort of situation that can easily be improved with multithreading. Rewriting to be multi-threaded and with just 5 threads instead of one, even on a single core cpu, gave an increase in throughput of over 20 times.
In this example, each "worker" thread was explicitly limited to what it did - open the remote a remote url, parse the data, store it in the db. All the "high level" processing - generating the list of urls to parse, working out which next, handling errors, all remained with the control of the main thread.
The use of threads makes you think more about the way your application needs threading and can in the long run make it easier to improve / control your performance.
Async methods are faster to use but they are a bit magic - a lot of things happen to make them possible - so it's probable that at some point you will need something that they can't give you. Then you can try and roll some custom threading code.
It all depends on your needs.
The answer is "it depends".
It depends on what you're trying to achieve. I'm going to assume that you're aiming for more performance.
The simplest solution is to find another way to improve your performance. Run a profiler. Look for hot spots. Reduce unnecessary IO.
The next solution is to break your program into multiple processes, each of which can run in their own address space. This is easiest because there is no chance of the individual processes messing each other up.
The next solution is to use threads. At this point you're opening a major can of worms, so start small, and only multi-thread the critical path of the code.
The next solution is to use asynch IO. Generally only recommended for people writing some of very heavily loaded server, and even then I would rather re-use one of the existing frameworks that abstract away the details e.g. the C++ framework ICE, or an EJB server under java.
