Many threads or as few threads as possible? - multithreading

As a side project I'm currently writing a server for an age-old game I used to play. I'm trying to make the server as loosely coupled as possible, but I am wondering what would be a good design decision for multithreading. Currently I have the following sequence of actions:
Startup (creates) ->
Server (listens for clients, creates) ->
Client (listens for commands and sends period data)
I'm assuming an average of 100 clients, as that was the max at any given time for the game. What would be the right decision as for threading of the whole thing? My current setup is as follows:
1 thread on the server which listens for new connections, on new connection create a client object and start listening again.
Client object has one thread, listening for incoming commands and sending periodic data. This is done using a non-blocking socket, so it simply checks if there's data available, deals with that and then sends messages it has queued. Login is done before the send-receive cycle is started.
One thread (for now) for the game itself, as I consider that to be separate from the whole client-server part, architecturally speaking.
This would result in a total of 102 threads. I am even considering giving the client 2 threads, one for sending and one for receiving. If I do that, I can use blocking I/O on the receiver thread, which means that thread will be mostly idle in an average situation.
My main concern is that by using this many threads I'll be hogging resources. I'm not worried about race conditions or deadlocks, as that's something I'll have to deal with anyway.
My design is setup in such a way that I could use a single thread for all client communications, no matter if it's 1 or 100. I've separated the communications logic from the client object itself, so I could implement it without having to rewrite a lot of code.
The main question is: is it wrong to use over 200 threads in an application? Does it have advantages? I'm thinking about running this on a multi-core machine, would it take a lot of advantage of multiple cores like this?
Thanks!
Out of all these threads, most of them will be blocked usually. I don't expect connections to be over 5 per minute. Commands from the client will come in infrequently, I'd say 20 per minute on average.
Going by the answers I get here (the context switching was the performance hit I was thinking about, but I didn't know that until you pointed it out, thanks!) I think I'll go for the approach with one listener, one receiver, one sender, and some miscellaneous stuff ;-)

use an event stream/queue and a thread pool to maintain the balance; this will adapt better to other machines which may have more or less cores
in general, many more active threads than you have cores will waste time context-switching
if your game consists of a lot of short actions, a circular/recycling event queue will give better performance than a fixed number of threads

To answer the question simply, it is entirely wrong to use 200 threads on today's hardware.
Each thread takes up 1 MB of memory, so you're taking up 200MB of page file before you even start doing anything useful.
By all means break your operations up into little pieces that can be safely run on any thread, but put those operations on queues and have a fixed, limited number of worker threads servicing those queues.
Update: Does wasting 200MB matter? On a 32-bit machine, it's 10% of the entire theoretical address space for a process - no further questions. On a 64-bit machine, it sounds like a drop in the ocean of what could be theoretically available, but in practice it's still a very big chunk (or rather, a large number of pretty big chunks) of storage being pointlessly reserved by the application, and which then has to be managed by the OS. It has the effect of surrounding each client's valuable information with lots of worthless padding, which destroys locality, defeating the OS and CPU's attempts to keep frequently accessed stuff in the fastest layers of cache.
In any case, the memory wastage is just one part of the insanity. Unless you have 200 cores (and an OS capable of utilizing) then you don't really have 200 parallel threads. You have (say) 8 cores, each frantically switching between 25 threads. Naively you might think that as a result of this, each thread experiences the equivalent of running on a core that is 25 times slower. But it's actually much worse than that - the OS spends more time taking one thread off a core and putting another one on it ("context switching") than it does actually allowing your code to run.
Just look at how any well-known successful design tackles this kind of problem. The CLR's thread pool (even if you're not using it) serves as a fine example. It starts off assuming just one thread per core will be sufficient. It allows more to be created, but only to ensure that badly designed parallel algorithms will eventually complete. It refuses to create more than 2 threads per second, so it effectively punishes thread-greedy algorithms by slowing them down.

I write in .NET and I'm not sure if the way I code is due to .NET limitations and their API design or if this is a standard way of doing things, but this is how I've done this kind of thing in the past:
A queue object that will be used for processing incoming data. This should be sync locked between the queuing thread and worker thread to avoid race conditions.
A worker thread for processing data in the queue. The thread that queues up the data queue uses semaphore to notify this thread to process items in the queue. This thread will start itself before any of the other threads and contain a continuous loop that can run until it receives a shut down request. The first instruction in the loop is a flag to pause/continue/terminate processing. The flag will be initially set to pause so that the thread sits in an idle state (instead of looping continuously) while there is no processing to be done. The queuing thread will change the flag when there are items in the queue to be processed. This thread will then process a single item in the queue on each iteration of the loop. When the queue is empty it will set the flag back to pause so that on the next iteration of the loop it will wait until the queuing process notifies it that there is more work to be done.
One connection listener thread which listens for incoming connection requests and passes these off to...
A connection processing thread that creates the connection/session. Having a separate thread from your connection listener thread means that you're reducing the potential for missed connection requests due to reduced resources while that thread is processing requests.
An incoming data listener thread that listens for incoming data on the current connection. All data is passed off to a queuing thread to be queued up for processing. Your listener threads should do as little as possible outside of basic listening and passing the data off for processing.
A queuing thread that queues up the data in the right order so everything can be processed correctly, this thread raises the semaphore to the processing queue to let it know there's data to be processed. Having this thread separate from the incoming data listener means that you're less likely to miss incoming data.
Some session object which is passed between methods so that each user's session is self contained throughout the threading model.
This keeps threads down to as simple but as robust a model as I've figured out. I would love to find a simpler model than this, but I've found that if I try and reduce the threading model any further, that I start missing data on the network stream or miss connection requests.
It also assists with TDD (Test Driven Development) such that each thread is processing a single task and is much easier to code tests for. Having hundreds of threads can quickly become a resource allocation nightmare, while having a single thread becomes a maintenance nightmare.
It's far simpler to keep one thread per logical task the same way you would have one method per task in a TDD environment and you can logically separate what each should be doing. It's easier to spot potential problems and far easier to fix them.

What's your platform? If Windows then I'd suggest looking at async operations and thread pools (or I/O Completion Ports directly if you're working at the Win32 API level in C/C++).
The idea is that you have a small number of threads that deal with your I/O and this makes your system capable of scaling to large numbers of concurrent connections because there's no relationship between the number of connections and the number of threads used by the process that is serving them. As expected, .Net insulates you from the details and Win32 doesn't.
The challenge of using async I/O and this style of server is that the processing of client requests becomes a state machine on the server and the data arriving triggers changes of state. Sometimes this takes some getting used to but once you do it's really rather marvellous;)
I've got some free code that demonstrates various server designs in C++ using IOCP here.
If you're using unix or need to be cross platform and you're in C++ then you might want to look at boost ASIO which provides async I/O functionality.

I think the question you should be asking is not if 200 as a general thread number is good or bad, but rather how many of those threads are going to be active.
If only several of them are active at any given moment, while all the others are sleeping or waiting or whatnot, then you're fine. Sleeping threads, in this context, cost you nothing.
However if all of those 200 threads are active, you're going to have your CPU wasting so much time doing thread context switches between all those ~200 threads.

Related

Relative merits between one thread per client and queuing thread models for a threaded server?

Let's say we're building a threaded server intended to run on a system with four cores. The two thread management schemes I can think of are one thread per client connection and a queuing system.
As the first system's name implies, we'll spawn one thread per client that connects to our server. Assuming one thread is always dedicated to our program's main thread of execution, we'll be able to handle up to three clients concurrently and for any more simultaneous clients than that we'll have to rely on the operating system's preemptive multitasking functionality to switch among them (or the VM's in the case of green threads).
For our second approach, we'll make two thread-safe queues. One is for incoming messages and one is for outgoing messages. In other words, requests and replies. That means we'll probably have one thread accepting incoming connections and placing their requests into the incoming queue. One or two threads will handle the processing of the incoming requests, resolving the appropriate replies, and placing those replies on the outgoing queue. Finally, we'll have one thread just taking replies off of that queue and sending them back out to the clients.
What are the pros and cons of these approaches? Notice that I didn't mention what kind of server this is. I'm assuming that which one has a better performance profile depends on whether the server handles short connections like a web servers and POP3 servers, or longer connections like a WebSocket servers, game servers, and messaging app servers.
Are there other thread management strategies besides these two?
I believe I've done both organizations at one time or another.
Method 1
Just so we're on the same page, the first has the main thread do a listen. Then, in a loop, it does accept. It then passes off the return value to a pthread_create and the client thread's loop does recv/send in loop processing all commands the remote client wants. When done, it cleans up and terminates.
For an example of this, see my recent answer: multi-threaded file transfer with socket
This has the virtues that the main thread and client threads are straightforward and independent. No thread waits on anything another thread is doing. No thread is waiting on anything that it doesn't have to. Thus, the client threads [plural] can all run at maximum line speed. Also, if a client thread is blocked on a recv or send, and another thread can go, it will. It is self balancing.
All thread loops are simple: wait for input, process, send output, repeat. Even the main thread is simple: sock = accept, pthread_create(sock), repeat
Another thing. The interaction between the client thread and its remote client can be anything they agree on. Any protocol or any type of data transfer.
Method 2
This is somewhat akin to an N worker model, where N is fixed.
Because the accept is [usually] blocking, we'll need a main thread that is similar to method 1. Except, that instead of firing up a new thread, it needs to malloc a control struct [or some other mgmt scheme] and put the socket in that. It then puts this on a list of client connections and then loops back to the accept
In addition to the N worker threads, you are correct. At least two control threads, one to do select/poll, recv, enqueue request and one to do wait for result, select/poll, send.
Two threads are needed to prevent one of these threads having to wait on two different things: the various sockets [as a group] and the request/result queues from the various worker threads. With a single control thread all actions would have to be non-blocking and the thread would spin like crazy.
Here is an [extremely] simplified version of what the threads look like:
// control thread for recv:
while (1) {
// (1) do blocking poll on all client connection sockets for read
poll(...)
// (2) for all pending sockets do a recv for a request block and enqueue
// it on the request queue
for (all in read_mask) {
request_buf = dequeue(control_free_list)
recv(request_buf);
enqueue(request_list,request_buf);
}
}
// control thread for recv:
while (1) {
// (1) do blocking wait on result queue
// (2) peek at all result queue elements and create aggregate write mask
// for poll from the socket numbers
// (3) do blocking poll on all client connection sockets for write
poll(...)
// (4) for all pending sockets that can be written to
for (all in write_mask) {
// find and dequeue first result buffer from result queue that
// matches the given client
result_buf = dequeue(result_list,client_id);
send(request_buf);
enqueue(control_free_list,request_buf);
}
}
// worker thread:
while (1) {
// (1) do blocking wait on request queue
request_buf = dequeue(request_list);
// (2) process request ...
// (3) do blocking poll on all client connection sockets for write
enqueue(result_list,request_buf);
}
Now, a few things to notice. Only one request queue was used for all worker threads. The recv control thread did not try to pick an idle [or under utilized] worker thread and enqueue to a thread specific queue [this is another option to consider].
The single request queue is probably the most efficient. But, maybe, not all worker threads are created equal. Some may end up on CPU cores [or cluster nodes] that have special acceleration H/W, so some requests may have to be sent to specific threads.
And, if that is done, can a thread do "work stealing"? That is, a thread completes all its work and notices that another thread has a request in its queue [that is compatible] but hasn't been started. The thread dequeues the request and starts working on it.
Here's a big drawback to this method. The request/result blocks are of [mostly] fixed size. I've done an implementation where the control could have a field for a "side/extra" payload pointer that could be an arbitrary size.
But, if doing a large transfer file transfer, either upload or download, trying to pass this piecemeal through request blocks is not a good idea.
In the download case, the worker thread could usurp the socket temporarily and send the file data before enqueuing the result to the control thread.
But, for the upload case, if the worker tried to do the upload in a tight loop, it would conflict with recv control thread. The worker would have to [somehow] alert the control thread to not include the socket in its poll mask.
This is beginning to get complex.
And, there is overhead to all this request/result block enqueue/dequeue.
Also, the two control threads are a "hot spot". The entire throughput of the system depends on them.
And, there are interactions between the sockets. In the simple case, the recv thread can start one on one socket, but other clients wishing to send requests are delayed until the recv completes. It is a bottleneck.
This means that all recv syscalls have to be non-blocking [asynchronous]. The control thread has to manage these async requests (i.e. initiate one and wait for an async completion notification, and only then enqueue the request on the request queue).
This is beginning to get complicated.
The main benefit to wanting to do this is having a large number of simultaneous clients (e.g. 50,000) but keep the number of threads to a sane value (e.g. 100).
Another advantage to this method is that it is possible to assign priorities and use multiple priority queues
Comparison and hybrids
Meanwhile, method 1 does everything that method 2 does, but in a simpler, more robust [and, I suspect, higher throughput way].
After a method 1 client thread is created, it might split the work up and create several sub-threads. It could then act like the control threads of method 2. In fact, it might draw on these threads from a fixed N pool just like method 2.
This would compensate for a weakness of method 1, where the thread is going to do heavy computation. With a large number threads all doing computation, the system would get swamped. The queuing approach helps alleviate this. The client thread is still created/active, but it's sleeping on the result queue.
So, we've just muddied up the waters a bit more.
Either method could be the "front facing" method and have elements of the other underneath.
A given client thread [method 1] or worker thread [method 2] could farm out its work by opening [yet] another connection to a "back office" compute cluster. The cluster could be managed with either method.
So, method 1 is simpler and easier to implement and can easily accomodate most job mixes. Method 2 might be better for heavy compute servers to throttle the requests to limited resources. But, care must be taken with method 2 to avoid bottlenecks.
I don't think your "second approach" is well thought out, so I'll just see if I can tell you how I find it most useful to think about these things.
Rule 1) Your throughput is maximized if all your cores are busy doing useful work. Try to keep your cores busy doing useful work.
These are things that can keep you from keeping your cores busy doing useful work:
you are keeping them busy creating threads. If tasks are short-lived, then use a thread pool so you aren't spending all your time starting up and killing threads.
you are keeping them busy switching contexts. Modern OSes are pretty good at multithreading, but if you've gotta switch jobs 10000 times per second, that overhead is going to add up. If that's a problem for you you'll have to consider and event-driven architecture or other sort of more efficient explicit scheduling.
your jobs block or wait for a long time, and you don't have the resources to run enough threads threads to keep your cores busy. This can be a problem when you're serving protocols with persistent connections that hang around doing nothing most of the time, like websocket chat. You don't want to keep a whole thread hanging around doing nothing by tying it to a single client. You'll need to architect around that.
All your jobs need some other resource besides CPU, and you're bottlenecked on that -- that's a discussion for another day.
All that said... for most request/response kinds of protocols, passing each request or connection off to a thread pool that assigns it a thread for the duration of the request is easy to implement and performant in most cases.
Rule 2) Given maximized throughput (all your cores are usefully busy), getting jobs done on a first-come, first-served basis minimizes latency and maximizes responsiveness.
This is truth, but in most servers it is not considered at all. You can run into trouble here when your server is busy and jobs have to stop, even for short moments, to perform a lot of blocking operations.
The problem is that there is nothing to tell the OS thread scheduler which thread's job came in first. Every time your thread blocks and then becomes ready, it is scheduled on equal terms with all the other threads. If the server is busy, that means that the time it takes to process your request is roughly proportional to the number of times it blocks. That is generally no good.
If you have to block a lot in the process of processing a job, and you want to minimize the overall latency of each request, you'll have to do your own scheduling that keeps track of which jobs started first. In an event-driven architecture, for example, you can give priority to handling events for jobs that started earlier. In a pipelined architecture, you can give priority to later stages of the pipeline.
Remember these two rules, design your server to keep your cores busy with useful work, and do first things first. Then you can have a fast and responsive server.

How is Node.js inherently faster while it still uses Threads internally?

Referring to the following discussion:
How is Node.js inherently faster when it still relies on Threads internally?
After having gone through all responses, I still have basic questions: If a DB call is made, 'somebody' has to block for the call to return. It turns into a blocking call deep down. Somebody has to make a call to the DB. The 'somebody' has to be a thread. If there are 50 DB calls, though they appear to be non-blocking to the Javascript, deep down they have all blocked. If there are 50 calls, for them to be all fired together on the DB, they have to be each sent to the DB by a thread. This means there would be 50 threads that have sent the DB call and are waiting for their call to return. This is no different than having 50 threads like they do in Apache. Please rectify my understanding. What is Node.js doing cleverly and how to ensure that fewer threads than 50 run in this case?
You are... partially correct. If there are 50 concurrent DB calls, then that means 50 threads, each dedicated to a DB call (actually, the reality is that by default, node provides only 4 concurrent threads in its thread pool, if you want more you have to explicitly specify how many threads you're willing to allow node to spin up; see my answer here - any excess requests are queued).
What makes this more efficient than Apache is that each of those threads is dedicated to the smallest functional unit... it lives only for the life of that database call, and then it's relinquished (in this case, a new thread is created, up to the limit, and then that thread is put back into the pool). This is in dramatic opposition to Apache, which spins up a thread for each new request, and may have to service multiple database calls and other processing in between until that request is completed and can then be relinquished.
Ultimately, this results in each thread spending more of its time doing work or in the pool waiting for more work and less time being idle and unavailable.
Be aware that this is workload-dependent, there are workloads that work better in the Apache model, but in general, most web style workloads are more suited to the node model.
The difference is that I believe in something like apache, those threads are going to be handled in the order they are received. So if thread one is waiting for 10TB of data and thread two is only waiting for 10KB of data, thread two has to wait until thread one is done even though it's work could be done far faster and return quicker. With node the idea is that each thread waiting for I/O is returned as soon as it is done. So depending on the I/O the same situation could allow thread two to return before thread one is done it's work. This is older but still an excellent write up I believe of threading in node. http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/

What logically is an event loop in a thread?

I came across node.js and python's tornado vs the Apache.
They say :
Apache makes a thread for every connection.
Node.js & tornado actually does event looping on a thread and a single thread can handle many connections.
I don't understand that what logically be a child of a thread.
In computer science terms:
Processes have isolated memory and share CPU with context switches.
Threads divides a process.
Therefore, a process with multiple control points is achieved by multiple threads.
Now,
What how does event loop works under a thread ?
How can it handle different connection under 1 control of a thread ?
Update :
I mean if there is communication with 3 sockets under 1 thread, how can 1 thread communicate with 3 sockets without keeping anyone on wait ?
An event loop at its basic level is something like:
while getNextEvent (&event) {
dispatchEvent (&event);
}
In other words, it's nothing more than a loop which continuously retrieves events from a queue of some description, then dispatches the event to an event handling procedure.
It's likely you know that already but I'm just explaining it for context.
In terms of how the different servers handle it, it appears that every new connection being made in Apache has a thread created for it, and that thread is responsible for that connection and nothing else.
For the other two, it's likely that there are a "set" number of threads running (though this may actually vary based on load) and a connection is handed off to one of those threads. That means any one thread may be handling multiple connections at any point in time.
So the event in that case would have to include some details as to what connection it applies to, so the thread can keep the different connections isolated from each other.
There are no doubt pros and cons to both options. A one-connection-per-thread optio n would have simplified code in the thread function since it didn't have to deal with multiple connections but it may end up with a lot of resource usage as the load got high.
In a multiple-connection-per-thread scenario, the code is a little more complex but you can generally minimise thread creation and destruction overhead by simply having the maximum number of threads running all the time. Outside of high-load periods, they'll just be sitting around doing nothing, waiting on a connection event to be given to them.
And, even under high load, it may be that each thread can quite easily process five concurrent connections without dropping behind which would mean the one-connection-per-thread option was a little wasteful.
Based on your update:
I mean if there is communication with 3 sockets under 1 thread, how can 1 thread communicate with 3 sockets without keeping anyone on wait ?
There are a great many ways to do this. For a start, it would generally all be abstracted behind the getNextEvent() call, which would probably be responsible for handling all connections and farming them out to the correct threads.
At the lowest levels, this could be done with something like a select call, a function that awaits activity on one of many file descriptors, and returns information relating to which file descriptor has something to say.
For example, you provide a file descriptor set of all currently open sockets and pass that to select. It will then give you back a modified set, containing only those that are of interest to you (such as ready-to-read-from).
You can then query that set and dispatch events to the corresponding thread.

Why is threading used for sockets?

Ever since I discovered sockets, I've been using the nonblocking variants, since I didn't want to bother with learning about threading. Since then I've gathered a lot more experience with threading, and I'm starting to ask myself.. Why would you ever use it for sockets?
A big premise of threading seems to be that they only make sense if they get to work on their own set of data. Once you have two threads working on the same set of data, you will have situations such as:
if(!hashmap.hasKey("bar"))
{
dostuff // <-- meanwhile another thread inserts "bar" into hashmap
hashmap[bar] = "foo"; // <-- our premise that the key didn't exist
// (likely to avoid overwriting something) is now invalid
}
Now imagine hashmap to map remote IPs to passwords. You can see where I'm going. I mean, sure, the likelihood of such thread-interaction going wrong is pretty small, but it's still existent, and to keep one's program secure, you have to account for every eventuality. This will significantly increase the effort going into design, as compared to simple, single-threaded workflow.
I can completely see how threading is great for working on separate sets of data, or for programs that are explicitly optimized to use threading. But for the "general" case, where the programmer is only concerned with shipping a working and secure program, I can not find any reason to use threading over polling.
But seeing as the "separate thread" approach is extremely widespread, maybe I'm overlooking something. Enlighten me! :)
There are two common reasons for using threads with sockets, one good and one not-so-good:
The good reason: Because your computer has more than one CPU core, and you want to make use of the additional cores. A single-threaded program can only use a single core, so with a heavy workload you'd have one core pinned at 100%, and the other cores sitting unused and going to waste.
The not-so-good reason: You want to use blocking I/O to simplify your program's logic -- in particular, you want to avoid dealing with partial reads and partial writes, and keep each socket's context/state on the stack of the thread it's associated with. But you also want to be able to handle multiple clients at once, without slow client A causing an I/O call to block and hold off the handling of fast client B.
The reason the second reason is not-so-good is that while having one thread per socket seems to simplify the program's design, in practice it usually complicates it. It introduces the possibility of race conditions and deadlocks, and makes it difficult to safely access shared data (as you mentioned). Worse, if you stick with blocking I/O, it becomes very difficult to shut the program down cleanly (or in any other way effect a thread's behavior from anywhere other than the thread's socket), because the thread is typically blocked in an I/O call (possibly indefinitely) with no reliable way to wake it up. (Signals don't work reliably in multithreaded programs, and going back to non-blocking I/O means you lose the simplified program structure you were hoping for)
In short, I agree with cib -- multithreaded servers can be problematic and therefore should generally be avoided unless you absolutely need to make use of multiple cores -- and even then it might be better to use multiple processes rather than multiple threads, for safety's sake.
The biggest advantage of threads is to prevent the accumulated lag time from processing requests. When polling you use a loop to service every socket with a state change. For a handful of clients, this is not very noticeable, however it could lead to significant delays when dealing with significantly large number of clients.
Assuming that each transaction requires some pre-processing and post processing (depending on the protocol this may be trivial amount of processing, or it could be relatively significant as is the case with BEEP or SOAP). The combined time to pre-process/post-process requests could lead to a backlog of pending requests.
For illustration purposes imagine that the pre-processing, processing, and post-processing stage of a request each consumes 1 microsecond so that the total request takes 3 microseconds to complete. In a single threaded environment the system would become overwhelmed if incoming requests exceed 334 requests per second (since it would take 1.002 seconds to service all requests received within a 1 second period of time) leading to a time deficit of 0.002 seconds each second. However if the system were using threads, then it would be theoretically possible to only require 0.336 seconds * (0.334 for shared data access + 0.001 pre-processing + 0.001 post processing) of processing time to complete all of the requests received in a 1 second time period.
Although theoretically possible to process all requests in 0.336 seconds, this would require each request to have it's own thread. More reasonably would be to multiple the combined pre/post processing time (0.668 seconds) by the number of requests and divide by the number of configured threads. For example, using the same 334 incoming requests and processing time, theoritically 2 threads would complete all requests in 0.668 seconds (0.668 / 2 + 0.334), 4 threads in 0.501 seconds, and 8 threads in 0.418 seconds.
If the highest request volume your daemon receives is relatively low, then a single threaded implementation with non-blocking I/O is sufficient, however if you expect occasionally bursts of high volume of requests then it is worth considering a multi-threaded model.
I've written more than a handful of UNIX daemons which have relatively low throughput and I've used a single-threaded for the simplicity. However, when I wrote a custom netflow receiver for an ISP, I used a threaded model for the daemon and it was able to handle peak times of Internet usage with minimal bumps in system load average.

Why are message queues used insted of mulithreading?

I have the following query which i need someone to please help me with.Im new to message queues and have recently started looking at the Kestrel message queue.
As i understand,both threads and message queues are used for concurrency in applications so what is the advantage of using message queues over multitreading ?
Please help
Thank you.
message queues allow you to communicate outside your program.
This allows you to decouple your producer from your consumer. You can spread the work to be done over several processes and machines, and you can manage/upgrade/move around those programs independently of each other.
A message queue also typically consists of one or more brokers that takes care of distributing your messages and making sure the messages are not lost in case something bad happens (e.g. your program crashes, you upgrade one of your programs etc.)
Message queues might also be used internally in a program, in which case it's often just a facility to exchange/queue data from a producer thread to a consumer thread to do async processing.
Actually, one facilitates the other. Message queue is a nice and simple multithreading pattern: when you have a control thread (usually, but not necessarily an application's main thread) and a pool of (usually looping) worker threads, message queues are the easiest way to facilitate control over the thread pool.
For example, to start processing a relatively heavy task, you submit a corresponding message into the queue. If you have more messages, than you can currently process, your queue grows, and if less, it goes vice versa. When your message queue is empty, your threads sleep (usually by staying locked under a mutex).
So, there is nothing to compare: message queues are part of multithreading and hence they're used in some more complicated cases of multithreading.
Creating threads is expensive, and every thread that is simultaneously "live" will add a certain amount of overhead, even if the thread is blocked waiting for something to happen. If program Foo has 1,000 tasks to be performed and doesn't really care in what order they get done, it might be possible to create 1,000 threads and have each thread perform one task, but such an approach would not be terribly efficient. An second alternative would be to have one thread perform all 1,000 tasks in sequence. If there were other processes in the system that could employ any CPU time that Foo didn't use, this latter approach would be efficient (and quite possibly optimal), but if there isn't enough work to keep all CPUs busy, CPUs would waste some time sitting idle. In most cases, leaving a CPU idle for a second is just as expensive as spending a second of CPU time (the main exception is when one is trying to minimize electrical energy consumption, since an idling CPU may consume far less power than a busy one).
In most cases, the best strategy is a compromise between those two approaches: have some number of threads (say 10) that start performing the first ten tasks. Each time a thread finishes a task, have it start work on another until all tasks have been completed. Using this approach, the overhead related to threading will be cut by 99%, and the only extra cost will be the queue of tasks that haven't yet been started. Since a queue entry is apt to be much cheaper than a thread (likely less than 1% of the cost, and perhaps less than 0.01%), this can represent a really huge savings.
The one major problem with using a job queue rather than threading is that if some jobs cannot complete until jobs later in the list have run, it's possible for the system to become deadlocked since the later tasks won't run until the earlier tasks have completed. If each task had been given a separate thread, that problem would not occur since the threads associated with the later tasks would eventually manage to complete and thus let the earlier ones proceed. Indeed, the more earlier tasks were blocked, the more CPU time would be available to run the later ones.
It makes more sense to contrast message queues and other concurrency primitives, such as semaphores, mutex, condition variables, etc. They can all be used in the presence of threads, though message-passing is also commonly used in non-threaded contexts, such as inter-process communication, whereas the others tend to be confined to inter-thread communication and synchronisation.
The short answer is that message-passing is easier on the brain. In detail...
Message-passing works by sending stuff from one agent to another. There is generally no need to coordinate access to the data. Once an agent receives a message it can usually assume that it has unqualified access to that data.
The "threading" style works by giving all agent open-slather access to shared data but requiring them to carefully coordinate their access via primitives. If one agent misbehaves, the process becomes corrupted and all hell breaks loose. Message passing tends to confine problems to the misbehaving agent and its cohort, and since agents are generally self-contained and often programmed in a sequential or state-machine style, they tend not to misbehave as often — or as mysteriously — as conventional threaded code.

Resources