How to find out the optimal amount of threads? - multithreading

I'm planning to make a software with lot of peer to peer like network connections. Normally I would create an own thread for every connection to send and receive data, but in this case with 300-500+ connections it would mean continuously creating and destroying a lot of threads which would be a big overhead I guess. And making one thread that handles all the connections sequentially could probably slow down things a little. (I'm not really sure about this.)
The question is: how many threads would be optimal to handle this kind of problems? Would it be possible to calculate it in the software so it can decide itself to create less threads on an old computer with not as much resources and more on new ones?
It's a theoretical question, I wouldn't like to make it implementation or language dependant. However I think a lot of people would advice something like "Just use a ThreadPool, it will handle stuff like that" so let's say it will not be a .NET application. (I'll probably has to use some other parts of the code in an old Delphi project, so the language will be probably Delphi or maybe C++ but it's not decided yet.)

Understanding the performance of your application under load is key, as mentioned before profiling, measurements and re-testing is the way to go.
As a general guide Goetz talks about having
threads = number of CPUs + 1
for CPU bound applications, and
number of CPUs * (1 + wait time / service time)
for IO bound contexts

If this is Windows (you did mention .Net?), you should definitely implement this using I/O completion ports. This is the most efficient way to do Windows sockets I/O. There is an I/O-specific discussion of thread pool size at that documentation link.
The most important property of an I/O
completion port to consider carefully
is the concurrency value. The
concurrency value of a completion port
is specified when it is created with
CreateIoCompletionPort via the
NumberOfConcurrentThreads parameter.
This value limits the number of
runnable threads associated with the
completion port. When the total number
of runnable threads associated with
the completion port reaches the
concurrency value, the system blocks
the execution of any subsequent
threads associated with that
completion port until the number of
runnable threads drops below the
concurrency value.
Basically, your reads and writes are all asynchronous and are serviced by a thread pool whose size you can modify. But try it with the default first.
A good, free example of how to do this is at the Free Framework. There are some gotchas that looking at working code could help you short-circuit.

You could do a calculation based on cpu speed, cores, and memory space in your install and set a constant somewhere to tell your application how many threads to use. Semaphores and thread pools come to mind.
Personally I would separate the listening sockets from the sending ones and open sending sockets in runtime instead of running them as daemons; listening sockets can run as daemons.
Multithreading can be its own headache and introduce many bugs. The best thing to do is make a thread do one thing and block when processing to avoid undesired and unpredictable results.

Make the number of threads configurable.
Target a few specific configurations that are the most common ones that you expect to support.
Get a good performance profiler / instrument your code and then rigorously test with different values of 1. for all the different types of 2. till you find an optimal value that works for each configuration.
I know, this might seem like a not-so smart way to do things but i think when it comes to performance, benchmarking the results via testing is the only sure-fire way to really know how well / badly it will work.
Edit: +1 to the question whose link is posted by paxDiablo above as a comment. Its almost the same question and theres loads of information there including a very detailed reply by paxDiablo himself.

One thread per CPU, processing several (hundreds) connections.

Related

Would handling each TCP connection in a separate thread improve latency?

I have an FTP server, implemented on top of QTcpServer and QTcpSocket.
I take advantage of the signals and slots mechanism to support multiple TCP connections simultaneously, even though I have a single thread. My code returns as soon as possible to the event loop, it doesn't block (no wait functions), and it doesn't use nested event loops anywhere. That way I already have cooperative multitasking, like Win3.1 applications had.
But a lot of other FTP servers are multithreaded. Now I'm wondering if using a separate thread for handling each TCP connection would improve performance, and especially latency.
On one hand, threads add to latency because you need to start a new thread for each new connection, but on the other, with my cooperative multitasking, other TCP connections have to wait until I've returned to the main loop before their readyRead()/bytesWritten() signals can be handled.
In your current system and ignoring file I/O time one processor is always doing something useful if there's something useful to be done, and waiting ready-to-go if there's nothing useful to be done. If this were a single processor (single core) system you would have maximized throughput. This is often a very good design -- particularly for an FTP server where you don't usually have a human waiting on a packet-by-packet basis.
You have also minimized average latency (for a single processor system.) What you do not have is consistent latency. Measuring your system's performance is likely to show a lot of jitter -- a lot of variation in the time it takes to handle a packet. Again because this is FTP and not real-time process control or human interaction, jitter may not be a problem.
Now, however consider that there is probably more than one processor available on your system and that it may be possible to overlap I/O time and processing time.
To take full advantage of a multi-processor(core) system you need some concurrency.
This normally translates to using multiple threads, but it may be possible to achieve concurrency via asynchronous (non-blocking) file reads and writes.
However, adding multiple threads to a program opens up a huge can-of-worms.
If you do decide to go the MT route, I'd suggest that you consider depending on a thread-aware I/O library. QT may provide that for you (I'm not sure.) If not, take a look at boost::asio (or ACE for an older, but still solid solution). You'll discover that using the MT capabilities of such a library involves a considerable investment in learning time; however as it turns out the time to add on multithreading "by-hand" and get it right is even worse.
So I'd say stay with your existing solution unless you are worried about unused Processor cycles and/or jitter in which case start learning QT's multithreading support or boost::asio.
Do you need to start a new thread for each new connection? Could you not just have a pool of threads that acts on requests as and when they arrive. This should reduce some of the latency. I have to say that in general a multi-threaded FTP server should be more responsive that a single-threaded one. Is it possible to have an event based FTP server?

Instant Messaging Server Design

Let's suppose we have an instant messaging application, client-server based, not p2p. The actual protocol doesn't matter, what matters is the server architecture. The said server can be coded to operate in single-threaded, non-parallel mode using non-blocking sockets, which by definition allow us to perform operations like read-write effectively immediately (or instantly). This very feature of non-blocking sockets allows us to use some sort of select/poll function at the very core of the server and waste next to no time in the actual socket read/write operations, but rather to spend time processing all this information. Properly coded, this can be very fast, as far as I understand. But there is the second approach, and that is to multithread aggressively, creating a new thread (obviously using some sort of thread pool, because that very operation can be (very) slow on some platforms and under some circumstances), and letting those threads to work in parallel, while the main background thread handles accept() and stuff. I've seen this approach explained in various places over the Net, so it obviously does exist.
Now the question is, if we have non-blocking sockets, and immediate read/write operations, and a simple, easily coded design, why does the second variant even exist? What problems are we trying to overcome with the second design, i.e. threads? AFAIK those are usually used to work around some slow and possibly blocking operations, but no such operations seem to be present there!
I'm assuming you're not talking about having a thread per client as such a design is usually for completely diffreent reasons, but rather a pool of threads each handles several concurrent clients.
The reason for that arcitecture vs a single threaded server is simply to take advantage of multiple processors. You're doing more work than simply I/O. You have to parse the messages, do various work, maybe even run some more heavyweight crypto algorithms. All this takes CPU. If you want to scale, taking advantage of multiple processors will allow you to scale even more, and/or keep the latency even lower per client.
Some of the gain in such a design can be a bit offset by the fact you might need more locking in a multithreaded environment, but done right, and certainly depening on what you're doing, it can be a huge win - at the expense of more complexity.
Also, this might help overcome OS limitations . The I/O paths in the kernel might get more distributed among the procesors. Not all operating systems might fully be able to thread the IO from a single threaded applications. Back in the old days there were'nt all the great alternatives to the old *nix select(), which usually had a filedesciptor limit of 1024, and similar APIs severly started degrading once you told it to monitor too many socket. Spreading all those clients on multiple threads or processes helped overcome that limit.
As for a 1:1 mapping between threads, there's several reasons to implement that architecture:
Easier programming model, which might lead to less hard to find bugs, and faster to implement.
Support blocking APIs. These are all over the place. Having a thread handle many/all of the clients and then go on to do a blocking call to a database is going to stall everyone. Even reading files can block your application, and you usually can't monitor regular file handles/descriptors for IO events - or when you can, the programming model is often exceptionally complicated.
The drawback here is it won't scale, atleast not with the most widely used languages/framework. Having thousands of native threads will hurt performance. Though some languages provides a much more lightweight approach here, such as Erlang and to some extent Go.

Is firing off a Thread a valid answer to simplifying code?

As multi-processor and multi-core computers become more and more ubiquitous, is simply firing off a new thread a (relatively) simple and painless way of simplifying code? For instance, in a current personal project, I have a network server listening on a port. Since this is just a personal project, it's just a desktop app, with a GUI integrated into it for configuration. So, the app reads something like this:
Main()
Read configuration
Start listener thread
Run GUI
Listener Thread
While the app is running
Wait for a new connection
Run a client thread for the new connection
Client Thread
Write synchronously
Read synchronously
ad inifinitum, or till they disconnect
This approach means that while I have to worry about alot of locking, with the potential issues that involves, I avoid alot of spaghetti code from assynchronous calls, etc.
A slightly more insidious version of this came up today when I was working on the startup code. The startup was quick, but it was using lazy loading for alot of the configuration, which meant that while startup was quick, actually connecting to and using the service was difficult because of the lag while it loaded different sections (this was actually measurable in real time, up to 3-10 seconds sometimes). So I moved to a different strategy, on startup, loop through everything and force the lazy loading to kick in... but this made it start prohibitively slow; get up, go get a coffee slow. Final solution: throw the loop into a seperate thread with feedback in the system tray while it's still loading.
Is this "Meh, throw it in another thread, it'll be fine" attitude ok? At what point do you start getting diminishing returns and/or even reduced performance?
Multithreading does a lot of things, but I don't think "simplification" is ever one of them.
It's a great way to introduce bugs into code.
Using multiple threads properly is not easy. It should not be attempted by new developers.
In my opinion, multi-threaded programming is pretty high up on the difficulty (and complexity) scale, along with memory management. To me, the "Meh, throw it in another thread, it'll be fine" attitude is a bit too casual. Think long and hard you must, before forking threads you do.
No.
Plainly and simply, multithreading increases complexity and is a nearly trivial way to add bugs to code. There are concurrency issues such as synchronization, deadlock, race conditions, and priority inversion to name a few.
Secondly, the performance gains are not automatic. Recently, there was an excellent article in MSDN Magazine along these lines. The salient details are that a certain operation was taking 46 seconds per ten iterations coded as a single-threaded operation. The author parallelized the operation naively (one thread per four cores) and the operation dropped to 30 seconds per ten iterations. Sounds great until you take into consideration that the operation now eats 300% more processing power but only experienced a 34% gain in efficiency. It's not worth consuming all available processing power for a gain like that.
This gives you the extra job of debugging race conditions, and handling locks and sycronisation issues.
I would not use this unless there was a real need.
Read up on Amdahl's law, best summarized by "The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program."
As it turns out, if only a small part of your app can run in parallel you won't get much gains, but potentially many hard-to-debug bugs.
I don't mean to be flip but what's in that configuration file that it takes so long to load? That's the origin of your problem, right?
Before spawning another thread to handle it, perhaps it can be parred down? Reduced, perhaps put in another data format that would be quicker, etc?
How often does it change? Is it something you can parse once at the beginning of the day and put the variables in shared memory so subsequent runs of your main program can just attach and get the needed values from there?
While I agree with everyone else here in saying that multithreading does not simplify code, it can be used to greatly simplify the user experience of your application.
Consider an application that has a lot of interactive widgets (I am currently developing one where this helps) - in the workflow of my application, a user can "build" the current project they are working on. This requires disabling the interactive widgets my application presents to the user and presenting a dialog with a indeterminate progress bar and a friendly "please wait" message.
The "build" occurs on a background thread; if it were to happen on the UI thread it would make the user experience less enjoyable - after all, it's no fun not being able to tell whether or not you are able to click on a widget in an application while a background task is running (cough, Visual Studio). Not to say that VS doesn't use background threads, I'm just saying their user experience could use some improvement. But I digress.
The one thing I take issue with in the title of your post is that you think of firing off threads when you need to perform tasks - I generally prefer to reuse threads - in .NET, I generally favor using the system thread pool over creating a new thread each time I want to do something, for the sake of performance.
I'm going to provide some balance against the unanimous "no".
DISCLAIMER: Yes, threads are complicated and can cause a whole bunch of problems. Everyone else has pointed this out.
From experience, a sequence of blocking reads/writes to a socket (which requires a separate thead) is much simpler than non-blocking ones. With blocking calls, you can tell the state of the connection just by looking at where you are in the function. With non-blocking calls, you need a bunch of variables to record the state of the connection, and check and modify them every time you interact with the connection. With blocking calls, you can just say "read the next X bytes" or "read until you find X" and it will actually do it (or fail). With non-blocking calls, you have to deal with fragmented data which usually requires keeping temporary buffers and filling them as necessary. You also end up checking if you've received enough data every time you receive little more. Plus you have to keep a list of open connections and handle unexpected closes for all of them.
It doesn't get much simpler than this:
void WorkerThreadMain(Connection connection) {
Request request = ReadRequest(connection);
if(!request) return;
Reply reply = ProcessRequest(request);
if(!connection.isOpen) return;
SendReply(reply, connection);
connection.close();
}
I'd like to note that this "listener spawns off a worker thread per connection" pattern is how web servers are designed, and I assume it's how a lot of request/response soft of server applications are designed.
So in conclusion, I have experienced the asynchronous socket spaghetti code you mentioned, and spawning off worker threads for every connection ended up being a good solution. Having said all this, throwing threads at a problem should usually be your last resort.
I think your have no choice but to deal with threads especially with networking and concurrent connections. Do threads make code simpler? I don't think so. But without them how would you program a server that can handle more than 1 client at the same time?

When is multi-threading not a good idea? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I was recently working on an application that sent and received messages over Ethernet and Serial. I was then tasked to add the monitoring of DIO discretes. I throught,
"No reason to interrupt the main
thread which is involved in message
processing, I'll just create
another thread that monitors DIO."
This decision, however, proved to be poor. Sometimes the main thread would be interrupted between a Send and a Receive serial message. This interruption would disrupt the timing and alas, messages would be lost (forever).
I found another way to monitor the DIO without using another thread and Ethernet and Serial communication were restored to their correct functionality.
The whole fiasco, however, got me thinking. Are their any general guidelines about when not to use multiple-threads and/or does anyone have anymore examples of situations when using multiple-threads is not a good idea?
**EDIT:Based on your comments and after scowering the internet for information, I have composed a blog post entitled When is multi-threading not a good idea?
On a single processor machine and a desktop application, you use multi threads so you don't freeze the app but for nothing else really.
On a single processor server and a web based app, no need for multi threading because the web server handles most of it.
On a multi-processor machine and desktop app, you are suggested to use multi threads and parallel programming. Make as many threads as there are processors.
On a multi-processor server and a web based app, no need again for multi threads because the web server handles it.
In total, if you use multiple threads for other than un-freezing desktop apps and any other generic answer, you will make the app slower if you have a single core machine due to the threads interrupting each other.
Why? Because of the hardware switches. It takes time for the hardware to switch between threads in total. On a multi-core box, go ahead and use 1 thread for each core and you will greatly see a ramp up.
To paraphrase an old quote: A programmer had a problem. He thought, "I know, I'll use threads." Now the programmer has two problems. (Often attributed to JWZ, but it seems to predate his use of it talking about regexes.)
A good rule of thumb is "Don't use threads, unless there's a very compelling reason to use threads." Multiple threads are asking for trouble. Try to find a good way to solve the problem without using multiple threads, and only fall back to using threads if avoiding it is as much trouble as the extra effort to use threads. Also, consider switching to multiple threads if you're running on a multi-core/multi-CPU machine, and performance testing of the single threaded version shows that you need the performance of the extra cores.
Multi-threading is a bad idea if:
Several threads access and update the same resource (set a variable, write to a file), and you don't understand thread safety.
Several threads interact with each other and you don't understand mutexes and similar thread-management tools.
Your program uses static variables (threads typically share them by default).
You haven't debugged concurrency issues.
Actually, multi threading is not scalable and is hard to debug, so it should not be used in any case if you can avoid it. There is few cases where it is mandatory : when performance on a multi CPU matters, or when you deal whith a server that have a lot of clients taking a long time to answer.
In any other cases, you can use alternatives such as queue + cron jobs or else.
You might want to take a look at the Dan Kegel's "The C10K problem" web page about handling multiple data sources/sinks.
Basically it is best to use minimal threads, which in sockets can be done in most OS's w/ some event system (or asynchronously in Windows using IOCP).
When you run into the case where the OS and/or libraries do not offer a way to perform communication in a non-blocking manner, it is best to use a thread-pool to handle them while reporting back to the same event loop.
Example diagram of layout:
Per CPU [*] EVENTLOOP ------ Handles nonblocking I/O using OS/library utilities
| \___ Threadpool for various blocking events
Threadpool for handling the I/O messages that would take long
Multithreading is bad except in the single case where it is good. This case is
The work is CPU Bound, or parts of it is CPU Bound
The work is parallelisable.
If either or both of these conditions are missing, multithreading is not going to be a winning strategy.
If the work is not CPU bound, then you are waiting not on threads to finish work, but rather for some external event, such as network activity, for the process to complete its work. Using threads, there is the additional cost of context switches between threads, The cost of synchronization (mutexes, etc), and the irregularity of thread preemption. The alternative in most common use is asynchronous IO, in which a single thread listens to several io ports, and acts on whichever happens to be ready now, one at a time. If by some chance these slow channels all happen to become ready at the same time, It might seem like you will experience a slow-down, but in practice this is rarely true. The cost of handling each port individually is often comparable or better than the cost of synchronizing state on multiple threads as each channel is emptied.
Many tasks may be compute bound, but still not practical to use a multithreaded approach because the process must synchronise on the entire state. Such a program cannot benefit from multithreading because no work can be performed concurrently. Fortunately, most programs that require enormous amounts of CPU can be parallelized to some level.
Multi-threading is not a good idea if you need to guarantee precise physical timing (like in your example). Other cons include intensive data exchange between threads. I would say multi-threading is good for really parallel tasks if you don't care much about their relative speed/priority/timing.
A recent application I wrote that had to use multithreading (although not unbounded number of threads) was one where I had to communicate in several directions over two protocols, plus monitoring a third resource for changes. Both protocol libraries required a thread to run the respective event loop in, and when those were accounted for, it was easy to create a third loop for the resource monitoring. In addition to the event loop requirements, the messages going through the wires had strict timing requirements, and one loop couldn't be risked blocking the other, something that was further alleviated by using a multicore CPU (SPARC).
There were further discussions on whether each message processing should be considered a job that was given to a thread from a thread pool, but in the end that was an extension that wasn't worth the work.
All-in-all, threads should if possible only be considered when you can partition the work into well defined jobs (or series of jobs) such that the semantics are relatively easy to document and implement, and you can put an upper bound on the number of threads you use and that need to interact. Systems where this is best applied are almost message passing systems.
In priciple everytime there is no overhead for the caller to wait in a queue.
A couple more possible reasons to use threads:
Your platform lacks asynchronous I/O operations, e.g. Windows ME (No completion ports or overlapped I/O, a pain when porting XP applications that use them.) Java 1.3 and earlier.
A third-party library function that can hang, e.g. if a remote server is down, and the library provides no way to cancel the operation and you can't modify it.
Keeping a GUI responsive during intensive processing doesn't always require additional threads. A single callback function is usually sufficient.
If none of the above apply and I still want parallelism for some reason, I prefer to launch an independent process if possible.
I would say multi-threading is generally used to:
Allow data processing in the background while a GUI remains responsive
Split very big data analysis onto multiple processing units so that you can get your results quicker.
When you're receiving data from some hardware and need something to continuously add it to a buffer while some other element decides what to do with it (write to disk, display on a GUI etc.).
So if you're not solving one of those issues, it's unlikely that adding threads will make your life easier. In fact it'll almost certainly make it harder because as others have mentioned; debugging mutithreaded applications is considerably more work than a single threaded solution.
Security might be a reason to avoid using multiple threads (over multiple processes). See Google chrome for an example of multi-process safety features.
Multi-threading is scalable, and will allow your UI to maintain its responsivness while doing very complicated things in the background. I don't understand where other responses are acquiring their information on multi-threading.
When you shouldn't multi-thread is a mis-leading question to your problem. Your problem is this: Why did multi-threading my application cause serial / ethernet communications to fail?
The answer to that question will depend on the implementation, which should be discussed in another question. I know for a fact that you can have both ethernet and serial communications happening in a multi-threaded application at the same time as numerous other tasks without causing any data loss.
The one reason to not use multi-threading is:
There is one task, and no user interface with which the task will interfere.
The reasons to use mutli-threading are:
Provides superior responsiveness to the user
Performs multiple tasks at the same time to decrease overall execution time
Uses more of the current multi-core CPUs, and multi-multi-cores of the future.
There are three basic methods of multi-threaded programming that make thread safety implemented with ease - you only need to use one for success:
Thread Safe Data types passed between threads.
Thread Safe Methods in the threaded object to modify data passed between.
PostMessage capabilities to communicate between threads.
Are the processes parallel? Is performance a real concern? Are there multiple 'threads' of execution like on a web server? I don't think there is a finite answer.
A common source of threading issues is the usual approaches employed to synchronize data. Having threads share state and then implement locking at all the appropriate places is a major source of complexity for both design and debugging. Getting the locking right to balance stability, performance, and scalability is always a hard problem to solve. Even the most experienced experts get it wrong frequently. Alternative techniques to deal with threading can alleviate much of this complexity. The Clojure programming language implements several interesting techniques for dealing with concurrency.

Threads or asynch?

How do you make your application multithreaded ?
Do you use asynch functions ?
or do you spawn a new thread ?
I think that asynch functions are already spawning a thread so if your job is doing just some file reading, being lazy and just spawning your job on a thread would just "waste" ressources...
So is there some kind of design when using thread or asynch functions ?
If you are talking about .Net, then don't forget the ThreadPool. The thread pool is also what asynch functions often use. Spawning to much threads can actually hurt your performance. A thread pool is designed to spawn just enough threads to do the work the fastest. So do use a thread pool instead of spwaning your own threads, unless the thread pool doesn't meet your needs.
PS: And keep an eye out on the Parallel Extensions from Microsoft
Spawning threads is only going to waste resources if you start spawning tons of them, one or two extra threads isn't going to effect the platforms proformance, infact System currently has over 70 threads for me, and msn is using 32 (I really have no idea how a messenger can use that many threads, exspecialy when its minimised and not really doing anything...)
Useualy a good time to spawn a thread is when something will take a long time, but you need to keep doing something else.
eg say a calculation will take 30 seconds. The best thing to do is spawn a new thread for the calculation, so that you can continue to update the screen, and handle any user input because users will hate it if your app freezes untill its finished doing the calculation.
On the other hand, creating threads to do something that can be done almost instantly is nearly pointless, since the overhead of creating (or even just passing work to an existing thread using a thread pool) will be higher than just doing the job in the first place.
Sometimes you can break your app into a couple of seprate parts which run in their own threads. For example in games the updates/physics etc may be one thread, while grahpics are another, sound/music is a third, and networking is another. The problem here is you really have to think about how these parts will interact or else you may have worse proformance, bugs that happen seemingly "randomly", or it may even deadlock.
I'll second Fire Lancer's answer - creating your own threads is an excellent way to process big tasks or to handle a task that would otherwise be "blocking" to the rest of synchronous app, but you have to have a clear understanding of the problem that you must solve and develope in a way that clearly defines the task of a thread, and limits the scope of what it does.
For an example I recently worked on - a Java console app runs periodically to capture data by essentially screen-scraping urls, parsing the document with DOM, extracting data and storing it in a database.
As a single threaded application, it, as you would expect, took an age, averaging around 1 url a second for a 50kb page. Not too bad, but when you scale out to needing to processes thousands of urls in a batch, it's no good.
Profiling the app showed that most of the time the active thread was idle - it was waiting for I/O operations - opening of a socket to the remote URL, opening a connection to the database etc. It's this sort of situation that can easily be improved with multithreading. Rewriting to be multi-threaded and with just 5 threads instead of one, even on a single core cpu, gave an increase in throughput of over 20 times.
In this example, each "worker" thread was explicitly limited to what it did - open the remote a remote url, parse the data, store it in the db. All the "high level" processing - generating the list of urls to parse, working out which next, handling errors, all remained with the control of the main thread.
The use of threads makes you think more about the way your application needs threading and can in the long run make it easier to improve / control your performance.
Async methods are faster to use but they are a bit magic - a lot of things happen to make them possible - so it's probable that at some point you will need something that they can't give you. Then you can try and roll some custom threading code.
It all depends on your needs.
The answer is "it depends".
It depends on what you're trying to achieve. I'm going to assume that you're aiming for more performance.
The simplest solution is to find another way to improve your performance. Run a profiler. Look for hot spots. Reduce unnecessary IO.
The next solution is to break your program into multiple processes, each of which can run in their own address space. This is easiest because there is no chance of the individual processes messing each other up.
The next solution is to use threads. At this point you're opening a major can of worms, so start small, and only multi-thread the critical path of the code.
The next solution is to use asynch IO. Generally only recommended for people writing some of very heavily loaded server, and even then I would rather re-use one of the existing frameworks that abstract away the details e.g. the C++ framework ICE, or an EJB server under java.
Note that each of these solutions has multiple sub-solutions - there are different breeds of threads and different kinds of asynch IO, each with slightly different performance characteristics, but again, it's generally best to let the framework handle it for you.

Resources