I recently tried to work out how the solution to a ThreadPool class works in .NET 4.0. I tried to read through a reflected code but it seems a bit too extensive for me.
Could someone explain in simple terms how this class works i.e.
How it stores each methods that are coming in
Is it thread safe, supposedly multiple threads try to enqueue their methods in the thread pool?
When it reaches the limit of available threads, how does it return to execute the remaining batch waiting in the queue when one of the threads becomes free? Is there some callback mechanism for it?

Of course, in the absence of the actual implementation (or in the absence of Eric Lippert :) ) what I'm saying is only common sense:
The thread pool holds an internal (circular?) queue where the tasks are kept (hence QueueUserWorkItem).
Putting tasks in the queue is thread-safe (this is for sure, as I've used myself in this scenario several times).
I think that each thread loops indefinitely and keeps taking tasks from the queue (in a thread-safe manner of course) automatically when it's done with the current task. If the queue is empty it will just block.

In a queue of delegates
TBH, I don't know for sure but, if it's not, it's dangerous, nearly useless and probably the worst code ever emitted by M$, (even including Windows ME). Just assume it's thread safe.
The work threads are while loops, waiting on the work request queue for a delegate, invoking one when it becomes available, then looping back round again when the the delegate returns to wait on the queue again for another delegate. There is no need for any callback.

I don't know exectly but to my mind it stores it in a collection of
MSDN says yes
GetMaxThreads() returns the amount of onetime-executed threads if
you reach this border all others are queued. As I understand you
need mechanism for knowing when thread is executed. There is
RegisterWaitForSingleObject(WaitHandle, WaitOrTimerCallback, Object, Int32, Boolean)


How Do Callbacks work in Non-blocking Design?

Looked at a few other questions but didn't quite find what I was looking for. Im using Scala but my questions is very high level and so is hopefully agnostic of any languages.
A regular scenario:
Thread A runs a function and there is some blocking work to be done (say a DB call).
The function has some non-blocking code (eg. Async block in Scala) to cause some sort of 'worker' Thread B (in a different pool) to pick up the I/O task.
The method in Thread A completes returning a Future which will eventually contain the result and Thread A is returned to its pool to quickly pick up another request to process.
Q1. Some thread somewhere usually has to wait?
My understanding of non-blocking architectures is that the common approach is to still have some Thread waiting/blocking on the I/O work somewhere - its just a case of having different pools which have access to different cores so that a small number of request processing threads can manage a large number of concurrent requests without ever waiting on a CPU core.
Is this a correct general understanding?
Q2. How the callback works ?
In the above scenario - Thread B that is doing the I/O work will run the callback function (provided by Thread A) if/when the I/O work has completed - which completes the Future with some Result.
Thread A is now off doing something else and has no association any more with the original request. How does the Result in the Future get sent back to the client socket? I understand that different languages have different implementations of such a mechanism but at a high level my current assumption is that (regardless of the language/framework) some framework/container objects must always be doing some sort of orchestration so that when a Future task is completed the Result gets sent back to the original socket handling the request.
I have spent hours trying to find articles which will explain this but every article seems to just deal with real low-level details. I know Im missing some details but i am having difficulty asking my question because Im not quite sure which parts Im missing :)
My understanding of non-blocking architectures is that the common approach is to still have some Thread waiting/blocking on the I/O work somewhere
If a thread is getting blocked somewhere, it is not really a non-blocking architecture. So no, that's not really a correct understanding of it. That doesn't mean that this is necessarily bad. Sometimes you just have to deal with blocking (using JDBC, for example). It would be better to push it off into a fixed thread pool designated for blocking, rather than allowing the entire application to suffer thread starvation.
Thread A is now off doing something else and has no association any more with the original request. How does the Result in the Future get sent back to the client socket?
Using Futures, it really depends on the ExecutionContext. When you create a Future, where the work is done depends on the ExecutionContext.
val f: Future[?] = ???
val g: Future[?] = ???
f and g are created immediately, and the work is submitted to a task queue in the ExecutionContext. We cannot guarantee which will actually execute or complete first in most cases. What you do with the values matters is well. Obviously if you use an Await to wait for the completion of the Futures, then we block the current thread. If we map them and do something with the values, then we again need another ExecutionContext to submit the task to. This gives us a chain of tasks that are asynchronously getting submitted and re-submitted to the executor for execution every time we manipulate the Future.
Eventually there needs to be some onComplete at the end of that chain to return the pass along that value to something, whether it's writing to stream, or something else. ie., it is probably out of the hands of the original thread.
Q1: No, at least not at the user code level. Hopefully your async I/O ultimately comes down to an async kernel API (e.g. select()). Which in turn will be using DMA to do the I/O and trigger an interrupt when it's done. So it's async at least down to the hardware level.
Q2: Thread B completes the Future. If you're using something like onComplete, then thread B will trigger that (probably by creating a new task and handing that task off to a thread pool to pick it up later) as part of the completing call. If a different thread has called Await to block on the Future, it will trigger that thread to resume. If nothing has accessed the Future yet, nothing in particular happens - the value sits there in the Future until something uses it. (See PromiseCompletingRunnable for the gritty details - it's surprisingly readable).

Win32 Uderstanding semaphore

I'm new to Multithread in Win32. And I have an assignment with Semaphore. But I cannot understand this.
Assume that we have 20 tasks (each task is the same with other tasks). We use semaphore then there's 2 circumstances:
First, there should be have 20 childthreads in order that each thread will handle 1 task.
Second, there would be have n childthreads. When a thread finishs a task, it will handle another task?
The second problem I counter that I cannot find any samples for Semaphore in Win32(API) but Consonle that I found in MSDN.
Can you help me with the "20 task" and tell me the instruction of writing a Semaphore in WinAPI application (Where should I place CreateSemaphore() function ...)?
Your suggestion will be appreciated.
You can start a thread for every task, which is a common approach, or you can use a "threadpool" where threads are reused. This is up to you. In both scenarios, you may or may not use a semaphore, the difference is only how you start the multiple threads.
Now, concerning your question where to place the CreateSemaphore() function, you should call that before starting any further threads. The reason is that these threads need to access the semaphore, but they can't do that if it doesn't exist yet. You could of course pass it to the other threads, but that again would give you the problem how to pass it safely without any race conditions, which is something that semaphores and other synchronization primitives are there to avoid. In other words, you would only complicate things by creating a chicken-and-egg problem.
Note that if this doesn't help you any further, you should perhaps provide more info. What are the goals? What have you done yourself so far? Any related questions here that you read but that didn't fully present answers to your problem?
Well, if you are contrained to using semaphores only, you could use two semaphores to create an unbounded producer-consumer queue class that you could use to implement a thread pool.
You need a 'SimpleQueue' class for task objects. I assume you either have one already, can easily build one or whatever.
In the ctor of your 'ProducerConsumerQueue' class, (or in main(), or in some factory function that returns a *ProducerConsumerQueue struct, whatever your language has), create a SimpleClass and two semaphores. A 'QueueCount' semaphore, initialized with a count of 0, and a 'QueueAccess' semaphore, initialized with a count of 1.
Add 'push(*task)' and ' *task pop()' methods/memberFunctions/methods to the ProducerConsumerQueue:
In 'push', first call 'WaitForSingleObject()' API on QueueAccess, then push the *task onto the SimpleQueue, then ReleaseSemaphore() API on QueueAccess. This pushes the *task in a thread-safe manner. Then ReleaseSemaphore() on QueueCount - this will signal any waiting threads.
In pop(), first call 'WaitForSingleObject()' API on QueueCount - this ensures that any calling consumer thread has to wait until there is a *task in the queue. Then call 'WaitForSingleObject()' API on QueueAccess, then pop task from the SimpleQueue, then ReleaseSemaphore() API on QueueAccess and return the task - this this thread-safely dequeues the *task.
Once you have created your ProducerConsumerQueue, create some threads to run the tasks. In CreateThread(), pass the same *ProducerConsumerQueue as the 'auxiliary' *void parameter.
In the thread function, cast the *void back to *ProducerConsumerQueue and then just loop around for ever, calling pop() and then running the returned task.
OK, your pool of threads is now ready to do stuff. If you want to run 20 tasks, create them in a loop and push them onto the ProducerConsumerQueue. The threads will then run them all.
You can create as many threads as you want to in the pool, (within reason). As many threads as cores is reasonable for tasks that are CPU-intensive. If the tasks make blocking calls, you may want to create many more threads for quickest overall throughput.
A useful enhancement is to check for 'null' in the thread function loop after each task is received and, if it is null, clean up an exit the thread, so terminating it. This allows the threads to be easily terminated by queueing up nulls, making it easier to shutdown your thread pool, (should you need to), and also to control the number of threads in the pool at runtime.

Alternative to Observer Pattern

Does anyone know of an alternative to the Observer a.k.a. Listener pattern?
I'm interested in something that would work well in an asynchronous
The problem I'm facing is that I have an application which uses this
pattern a lot, which is not a bad thing per se, but it becomes a bottleneck as the number of listeners increases. Combined with threading primitives (mutexes, critical sections - of course in my specific environment) the hit on performance is really bad.
How about Message Queue?
If there are too many observers, so the thread being observed is not making any progress, then it might be wise to reverse the relationship. Rather than have the observed thread call out to each and every observer, it may be better to have the observers wait on something like a condition variable or event associated with the observed thread. The observer code can then block, waiting for the condition variable to be signalled. The observed thread can then just signal the condition variable rather than calling into the observers; the observers can notice the signal and process the consequences in their own time.
Please take a look at it if reducing listeners in your code is your primary objective Jeffrey Richter and his AsyncEnumerator. This technique makes asynchronous programmin look more like synchronous.
With this technique your single method could issue an Asynch call and resto fo the method act as an event handler, therefore whole of the invokation and event listener code could be clubbed as one fucntion.
Difficult to say without a more concrete description but the Mediator pattern is related and can be used when the number of communicating objects starts to proliferate. You could implement some policy to co-ordinate the activities a more structured way within these.
Two alternatives from me: using actor model (like akka framework) or using executor to limit the parallelization. Executor is basically just a thread pool which will limit the number of thread and reuse finished threads.

Why was the method java.lang.Thread.join() named like that?

Does anybody know why the method join() member of a java.lang.Thread was named like that? Its javadoc is:
Waits for this thread to die.
When join is called on some thread calling thread is waiting for the other to die and continue execution. Supposedly calling thread will die as well, but still it's not clear why the author used this name.
It's a common name in threading - it's not like Java was the first to use it. (For example, that's what pthreads uses too.)
I guess you could imagine it like two people taking a walk - you join the other one and walk with them until you've finished, before going back to what you were doing. That sort of analogy may have been the original reason, although I agree it's not exactly intuitive.
It's named this way because you're basically stating that the calling thread of execution is going to wait to join the given state of execution. It's also named join in posix and many other threading packages.
After that call to join returns (unless it was interrupted), the two threads of execution are basically running together from that point (with that thread getting the return value of the now-terminated thread).
This stems from concurrent software modeling when the flow of control splits into to concurrent threads. Later, the two threads of execution will join again.
Also waitToDie() was probably a) too long and b) too morbid.
well... this isnt really correct but I thought of an "waiting room" (it actually isnt a queue with a certain scheduling as FIFO, HRRN or such).
when a thread cannot go on and needs to wait on some other thread to finish it just joins the guys (aka threads) in the waiting room to get active next...
Because you are waiting for another thread of execution (i.e. the one you're calling join on) to join (i.e. die) to the current (i.e. the calling) thread.
The calling thread does not die: it simply waits for the other thread to do so.
This is a terminology that is widely used(outside Java as well). I take it as sort of Associating a Thread with another one in some way. I think Thread.Associate() could have been a better option but Join() isn't bad either.

How to use queue with two threads-- one for consumer and one for producer

I am using an application where a lower level application always invokes a callback RecData(char *buf) when it receives data.
In the callback I am creating two threads and pass the consumer and producer function to these created threads respectively.
My code:
void RecData (char * buf)
CreateThread(NULL,0,producer_queue,(void *)buf,0,NULL);
The above works when I receive one data at a time. If I receive say 5 data almost at the same time then producer_queue should first put all the data in queue and then consumer_queue should start retrieving the data but here as soon as producer_queue puts the first data in queue, consumer_queue retrieves it.
What you want to do, I believe, is control access to the queue. You'll want to look at using a mutex to control reading from the queue.
When you recieve data, you will lock the mutex, then enqueue data. When you are done queing the data, then release the lock.
When reading from the queue, you will see if the mutex is locked. If you are writing data to the queue, you won't be able to start reading, until your producer thread has completed writing all of it's data and release the lock. If you actually lock the mutex, then you prevent your writer thread from writing while you are reading data.
This approach could introduce potential deadlocks. If your writer thread dies prior to releasing the lock, then your reader thread will not be able to continue (then again your thread dying may just trigger an error state).
I hope this makes sense.
Use the concept of condition variables. The probelm you have is the most common one in multi-threaded programming world. Just using mutexes doesn't help the situation. Always remember that mutexes are for locking & condition variables are for waiting. The later is always safer and almost certain when a thread should start consuming from a shared queue.
Check out the below link on how you can create a condition variable on your own on windows:
If you are using windows vista, the below msdn example may help you:
In all cases use the logic as shown in Schmidt's website as it looks more portable (oh yes portable on different versions of windows atleast). Schmidt's implemention gives you the standard POSIX api feel which is the widely used standard on most modern UNIX/LINUX systems.
