How to program number of your threads in Delphi - multithreading

I found this on the Dr Dobbs site today at
http://www.ddj.com/hpc-high-performance-computing/220300055?pgno=3
It's a nice suggestion regarding thread implmentation.
What is best way of achieving this with TThread in Delphi I wonder?
Thanks
Brian
=== From Dr Dobbs ==============
Make multithreading configurable! The number of threads used in a program should always be configurable from 0 (no additional threads at all) to an arbitrary number. This not only allows a customization for optimal performance, but it also proves to be a good debugging tool and sometimes a lifesaver when unknown race conditions occur on client systems. I remember more than one situation where customers were able to overcome fatal bugs by switching off multithreading. This of course does not only apply to multithreaded file I/O.
Consider the following pseudocode:
int CMyThreadManger::AddThread(CThreadObj theTask)
{
if(mUsedThreadCount >= gConfiguration.MaxThreadCount())
return theTask.Execute(); // execute task in main thread
// add task to thread pool and start the thread
...
}
Such a mechanism is not very complicated (though a little bit more work will probably be needed than shown here), but it sometimes is very effective. It also may be used with prebuilt threading libraries such as OpenMP or Intel's Threaded Building Blocks. Considering the measurements shown here, its a good idea to include more than one configurable thread count (for example, one for file I/O and one for core CPU tasks). The default might probably be 0 for file I/O and <number of cores found> for CPU tasks. But all multithreading should be detachable. A more sophisticated approach might even include some code to test multithreaded performance and set the number of threads used automatically, may be even individually for different tasks.
===================

I would create an abstract class TTask. This class is meant to executes the task. With the method Execute:
type
TTask = abstract class
protected
procedure DoExecute; virtual; abstract;
public
procedure Execute;
end;
TTaskThread = class (TThread)
private
FTask : TTask;
public
constructor Create(const ATask: TTask);
// Assigns FTask and enables thread, free on terminate.
procedure Execute; override; // Calls FTask.Execute.
end;
The method Execute checks the number of threads. If the max is not reached, it starts a thread using TTaskThread that calls DoExecute and as such execute the task in a thread. If the max is reached, DoExecute is called directly.

The answer by Gamecat is good as far as the abstract task class is concerned, but I think calling DoExecute() for a task in the calling thread (as the article itself does too) is a bad idea. I would always queue the tasks to be executed by background threads, unless threading was disabled completely, and here's why.
Consider the following (contrived) case, where you need to execute three independent CPU-bound procedures:
Procedure1_WhichTakes200ms;
Procedure2_WhichTakes400ms;
Procedure3_WhichTakes200ms;
For better utilisation of your dual core system you want to execute them in two threads. You would limit the number of background threads to one, so with the main thread you have as many threads as cores.
Now the first procedure will be executed in a worker thread, and it will finish after 200 milliseconds. The second procedure will start immediately and be executed in the main thread, as the single configured worker thread is already occupied, and it will finish after 400 milliseconds. Then the last procedure will be executed in the worker thread, which has already been sleeping for 200 milliseconds now, and will finish after 200 milliseconds. Total execution time 600 milliseconds, and for 2/3 of that time only one of both threads was actually doing meaningful work.
You could reorder the procedures (tasks), but in real life it's probably impossible to know in advance how long each task will take.
Now consider the common way of employing a thread pool. As per configuration you would limit the number of threads in the pool to 2 (number of cores), use the main thread only to schedule the threads into the pool, and then wait for all tasks to complete. With above sequence of queued tasks thread 1 would take the first task, thread two would take the second task. After 200 milliseconds the first task would complete, and the first worker thread would take the third task from the pool, which is empty afterwards. After 400 milliseconds both the second and the third task would complete, and the main thread would be unblocked. Total time for execution 400 milliseconds, with 100% load on both cores in that time.
At least for CPU-bound threads it's of vital importance to always have work queued for the OS scheduler. Calling DoExecute() in the main thread interferes with that, and shouldn't be done.

I generally have only one class inheriting from TThread, one that takes 'worker items' from a queue or stack, and have them suspend when no more items are available. The main program can then decide how many instances of this thread to instantiate and start. (using this config value).
This 'worker items queue' should also be smart enough to resume suspended threads or create a new thread when required (and when the limit permits it), when a worker item is queued or a thread has finished processing a worker item.

My framework allows for a thread pool count for any of the threads in a configuration file, if you wish to have a look (http://www.csinnovations.com/framework_overview.htm).

From a certain version (think is was one of the XE versions) Delphi has as Parallel Programming Library included:
https://docwiki.embarcadero.com/RADStudio/Sydney/en/Using_the_Parallel_Programming_Library
It has theTTask to scedule work, and also several configuration options and the possibility to create your own thread pool(s).

Related

OmniThreadLibrary memory leak (consumption) on pipeline running from another thread

I'm running pipeline (thread's pipeline from OmniThreadLibrary) from another thread and got memory leak or rather memory consumption. But when application close then it's ok and there are no memory leak report (ReportMemoryLeaksOnShutdown := True;).
Here example: click button 10 times and test app will get ~600 MB of memory. Windows 7 x64, Delphi XE6, latest omni source.
It's a bug? Or I need use another code?
uses
OtlParallel,
OtlCommon;
procedure TForm75.Button1Click(Sender: TObject);
begin
// run empty pipeline from another threads
Parallel.&For(1, 100).Execute(
procedure(value: integer)
var
pipe: IOmniPipeline;
begin
pipe := Parallel.Pipeline
.Stage(procedure(const input: TOmniValue; var output: TOmniValue) begin end)
.Run;
pipe.Cancel;
pipe.WaitFor(100000);
pipe := nil;
end
);
end;
Edit 1:
Tested that code with ProcessExplorer and found what threads count at runtime is constant, but handles count is grown. If I'm insert Application.ProcessMessages; at the end of "for loop" (after pipe's code) then test app running good, handles are closing and memory consumption is constant. Don't know why.
How many threads does it create ?
Check it in SysInternals Process Explorer for example.
Or in Delphi IDE (View -> Debug Windows -> Threads)
I think that because you block each For-worker for wuite a long WaitFor your application then creates many worker threads for every button click, and when you click it 10 times it consequently creates 10 times many threads.
And yes, in general-purpose operating systems like Windows threads are expensive! Google for "windows thread memory footprint" - and multiply it by the number of threads created by 10 parallel-for loop you spawn.
https://msdn.microsoft.com/en-us/library/windows/desktop/ms686774.aspx
https://blogs.technet.microsoft.com/markrussinovich/2009/07/05/pushing-the-limits-of-windows-processes-and-threads/
This fact was the reason that for making highly parallel server applications special approaches were done to create light-eight application-level threads and bypass OS threads, to name a few
Make special language that would spawn dozens of thousands of cooperative thread and cross-thread enforce memory safety by strict language rules: https://www.erlang.org/docs
Make a library, which cannot enforce those regulations but at least can demand programmer to follow them voluntarily: https://en.wikipedia.org/wiki/Actor_model
Fibers: no-protection threads within threads: What is the difference between a thread and a fiber?
However OTL being a generic library for generic threads imposes little restrictions but relies on OS-provided native threads, and they are heavy expensive in both CPU time needed to create/release Windows Threads (mitigated by Thread Pools concept) and by memory footprint needed to maintain each Windows Threads by OS (which is unavoidable and you see its manifestation).
Of course later, when all those loops are worked through, their threads are getting closed and released, together with the memory that was used to maintain them. So no memory leak indeed, once you wait enough for all your threads to be terminated - they are, with all the temporarily allocated memory they used as their workplaces.
UPD. How to check that hypothesis? easiest way would be to change how many threads is spawned by every instance of For-Loop (by every button click).
See the .NumTasks options of you Parallel.For object:
http://otl.17slon.com/book/chap04.html#leanpub-auto-iomniparallelsimpleloop-interface
By default every button click should spawn one thread for every CPU core. But you can enforce your own size of thread pool. Add .NumTasks(1) call and check memory consumption, then check it into .NumTasks(10) and do it again. If the memory consumption would grow approximately tenfold after that - then it is it.

Are thread pools safe and is use of them recommended?

I was researching the answer to this question and ran across this post. Is ThreadPool safe? How does ThreadPool compare with the OmniThreadLibrary? What are the pluses and minuses of using each?
Here is an example of what I am doing:
procedure DoWork(nameList: TList<Integer>)
var
i: Integer;
oneThread: PerNameThread;
begin
for (i := 0; to nameList.Count-1) do
begin
oneThread := PerNameThread.Create(Self);
oneThread.nameID = nameList[i];
oneThread.Start();
end
end;
I am creating a thread for each nameList item, and this could be up to 500 names. All these threads are too much, and slowing down the process, to the point where this process would be faster with just one thread.
First, you need to understand what a thread pool is.
A thread pool is a concept where you have a list of multiple threads that are suspended when they are not performing any tasks.
These threads are defined a bit differently than you are probably used to. Instead of them having all the necessary code inside their Execute() method, their Execute() method only contains a few lines of code to execute external code (giving the threads the ability to perform practically any processing that you require), take care of synchronizing the result back to the caller/UI, and returning the thread to the pool and putting it into a suspended state. When a new task is needed at a later time, a thread is given the task and resumed.
So by providing each thread with a method pointer for a task, you actually define what kind of job each thread will be processing each time it is run.
The main advantage of using a thread pool is that doing so avoids the overhead of creating and destroying a thread for each specific task.
As for OmniThreadLibrary, it is a full blown task management library. It uses its own thread pool and a pretty advanced task managing system that allows you to easily define which tasks can be executed in parallel, which tasks need to be executed in sequence, and which tasks have higher priority than others.
The only drawback of OmniThreadLibrary is that it is still limited to Windows only, so if you are thinking of providing multiplatform support for your application then you will have to find another solution.

.net 4.0 c# : Pausing/Resuming parallel running threads from threadpool temporarily?

I could setup a multi-threaded environment using the .net ThreadPool and I do get a significant performance benefit. This runs in the background of my application.
Now when a new task is requested by the user, I want it to get maximum CPU resources to maximize performance. Hence I would like to temporarily pause all the threads that I began (via the ThreadPool.Queueuserworkitem method) and then resume once the new task, requested by the user in foreground, is completed.
There could be several solutions to my problem:
a. Starting lesser background threads so that any new user request gets some share of the CPU resources. (but I loose the performance gain I had :( )
b. Set higher priority for the thread for a new user requested task. (not sure if this works?)
c. Suspending/resuming the ThreadPool threads I began. But suspending / resuming / interrupting threads is highly discouraged. Moreover, this could get tricky and error prone.
Any other ideas?
Note: when the user makes a request, performing the task would normally not take more than 300ms. However, when I start ThreadPool threads in background, it now takes about 3 seconds to complete (10 times worse)! I am OK if it takes 500-800ms though. All background threads complete in about 8 seconds (and I am OK if they take 1-2 seconds more). Hence, I am trying out option ( a ) for now.
Thanks in advance!
Be noted that Thread scheduling is done by CPU and hence cannot be directed from within a program. Only thing that can be done is setting ThreadPriority (that too on new Threads, not on ThreadPool threads). Check section Limitations of Using the Thread Pool
As your requirement is to suspend all background threads while executing a new task, what you can do is to create a class level flag.
Now you can put checkpoints in methods to be executed in Background task. At the checkpoints, check the class level flag, if it is set, call Thread.Sleep, which should (NOT MUST) trigger thread context switch by OS/CPU thread scheduler.
Putting checkpoints in methods (to be executed by ThreadPool) is analogous to putting checkpoints for cancellation support in background worker.

How to get the lowest cpu consumption when having an infinite loop in a thread

1.I have some infinite loops how can I get the lowest cpu consumption? Should I use a delay?
2.If I have multiple threads running in my application and one of them is THREAD_PRIORITY_IDLE does it affect other threads?
My code is as this for every thread
procedure TMatchLanLon.Execute;
begin
while not Terminated do
begin
//some code
Sleep(1000);
end;
end;
Typically a thread should sleep until signalled, but not using Sleep or SleepEx.
You create an Event and Wait for it to be signalled,either using TEvent or direct to Win32 API with WaitForSingleObject.
Sleep causes so many problems, including what I call "Sleeping beauty" disease. THe whole rest of your application has terminated and shut down a few hundred microseconds ago, and your thread has slept for a "million years" in relative computer timing terms, and when it wakes up the rest of your application has long since terminated. The next thing your background thread is likely to do is access some object which it has a reference to, which was frozen, and then (if you're lucky) it will crash. Don't use Sleep in threads. Wait for events, or use some pre-built worker thread (like the OmniThreadLibrary one).
I have some infinte loops how can i get the lowest cpu consumption ?
By blocking the loop until there is something to do.
If I have multiple threads running in my application and one of them is THREAD_PRIORITY_IDLE does it affect other threads ?
..depends . Probably not, but if any other threads are waiting on output from this thread, or the release of a lock from it, then the other threads are effectively 'dragged down' to THREAD_PRIORITY_IDLE as well.
Apart from this priority-inversion, (which can cause deadlocks when threads have several priority levels), spinlocks, a synchronization construct that is normally only bad, can become disastrous.

Thread Pool vs Thread Spawning

Can someone list some comparison points between Thread Spawning vs Thread Pooling, which one is better? Please consider the .NET framework as a reference implementation that supports both.
Thread pool threads are much cheaper than a regular Thread, they pool the system resources required for threads. But they have a number of limitations that may make them unfit:
You cannot abort a threadpool thread
There is no easy way to detect that a threadpool completed, no Thread.Join()
There is no easy way to marshal exceptions from a threadpool thread
You cannot display any kind of UI on a threadpool thread beyond a message box
A threadpool thread should not run longer than a few seconds
A threadpool thread should not block for a long time
The latter two constraints are a side-effect of the threadpool scheduler, it tries to limit the number of active threads to the number of cores your CPU has available. This can cause long delays if you schedule many long running threads that block often.
Many other threadpool implementations have similar constraints, give or take.
A "pool" contains a list of available "threads" ready to be used whereas "spawning" refers to actually creating a new thread.
The usefulness of "Thread Pooling" lies in "lower time-to-use": creation time overhead is avoided.
In terms of "which one is better": it depends. If the creation-time overhead is a problem use Thread-pooling. This is a common problem in environments where lots of "short-lived tasks" need to be performed.
As pointed out by other folks, there is a "management overhead" for Thread-Pooling: this is minimal if properly implemented. E.g. limiting the number of threads in the pool is trivial.
For some definition of "better", you generally want to go with a thread pool. Without knowing what your use case is, consider that with a thread pool, you have a fixed number of threads which can all be created at startup or can be created on demand (but the number of threads cannot exceed the size of the pool). If a task is submitted and no thread is available, it is put into a queue until there is a thread free to handle it.
If you are spawning threads in response to requests or some other kind of trigger, you run the risk of depleting all your resources as there is nothing to cap the amount of threads created.
Another benefit to thread pooling is reuse - the same threads are used over and over to handle different tasks, rather than having to create a new thread each time.
As pointed out by others, if you have a small number of tasks that will run for a long time, this would negate the benefits gained by avoiding frequent thread creation (since you would not need to create a ton of threads anyway).
My feeling is that you should start just by creating a thread as needed... If the performance of this is OK, then you're done. If at some point, you detect that you need lower latency around thread creation you can generally drop in a thread pool without breaking anything...
All depends on your scenario. Creating new threads is resource intensive and an expensive operation. Most very short asynchronous operations (less than a few seconds max) could make use of the thread pool.
For longer running operations that you want to run in the background, you'd typically create (spawn) your own thread. (Ab)using a platform/runtime built-in threadpool for long running operations could lead to nasty forms of deadlocks etc.
Thread pooling is usually considered better, because the threads are created up front, and used as required. Therefore, if you are using a lot of threads for relatively short tasks, it can be a lot faster. This is because they are saved for future use and are not destroyed and later re-created.
In contrast, if you only need 2-3 threads and they will only be created once, then this will be better. This is because you do not gain from caching existing threads for future use, and you are not creating extra threads which might not be used.
It depends on what you want to execute on the other thread.
For short task it is better to use a thread pool, for long task it may be better to spawn a new thread as it could starve the thread pool for other tasks.
The main difference is that a ThreadPool maintains a set of threads that are already spun-up and available for use, because starting a new thread can be expensive processor-wise.
Note however that even a ThreadPool needs to "spawn" threads... it usually depends on workload - if there is a lot of work to be done, a good threadpool will spin up new threads to handle the load based on configuration and system resources.
There is little extra time required for creating/spawning thread, where as thread poll already contains created threads which are ready to be used.
This answer is a good summary but just in case, here is the link to Wikipedia:
http://en.wikipedia.org/wiki/Thread_pool_pattern
For Multi threaded execution combined with getting return values from the execution, or an easy way to detect that a threadpool has completed, java Callables could be used.
See https://blogs.oracle.com/CoreJavaTechTips/entry/get_netbeans_6 for more info.
Assuming C# and Windows 7 and up...
When you create a thread using new Thread(), you create a managed thread that becomes backed by a native OS thread when you call Start – a one to one relationship. It is important to know only one thread runs on a CPU core at any given time.
An easier way is to call ThreadPool.QueueUserWorkItem (i.e. background thread), which in essence does the same thing, except those background threads aren’t forever tied to a single native thread. The .NET scheduler will simulate multitasking between managed threads on a single native thread. With say 4 cores, you’ll have 4 native threads each running multiple managed threads, determined by .NET. This offers lighter-weight multitasking since switching between managed threads happens within the .NET VM not in the kernel. There is some overhead associated with crossing from user mode to kernel mode, and the .NET scheduler minimizes such crossing.
It may be important to note that heavy multitasking might benefit from pure native OS threads in a well-designed multithreading framework. However, the performance benefits aren’t that much.
With using the ThreadPool, just make sure the minimum worker thread count is high enough or ThreadPool.QueueUserWorkItem will be slower than new Thread(). In a benchmark test looping 512 times calling new Thread() left ThreadPool.QueueUserWorkItem in the dust with default minimums. However, first setting the minimum worker thread count to 512, in this test, made new Thread() and ThreadPool.QueueUserWorkItem perform similarly.
A side effective of setting a high worker thread count is that new Task() (or Task.Factory.StartNew) also performed similarly as new Thread() and ThreadPool.QueueUserWorkItem.

Resources