I have a string list includes file paths. The count of list elements is 80. I want to create 8 threads continuously until files in list have moved. If a thread finishes its work, I will create one thread so that thread count must be 8.
Can anybody help me?
Unless each thread is writing to a different drive, having multiple threads copying files is slower than doing it with a single thread. The disk drive can only do one thing at a time. If you have eight threads all trying to write to the same disk drive, then it takes extra time to do disk head seeks and such.
Also, if you don't have at least eight CPU cores, then trying to run eight concurrent threads is going to require extra thread context switches. If you're doing this on a four-core machine, then you shouldn't have more than four threads working on it.
If you really need to have eight threads doing this, then put all of the file paths into a BlockingCollection, start eight threads, and have them go to work. So you have eight persistent threads rather than starting and stopping threads all the time. Something like this:
BlockingCollection<string> filePaths = new BlockingCollection<string>();
List<Thread> threads = new List<Thread>();
// add paths to queue
foreach (var path in ListOfFilePaths)
filePaths.Add(path);
filePaths.CompleteAdding();
// start threads to process the paths
for (int i = 0; i < 8; ++i)
{
Thread t = new Thread(CopyFiles);
threads.Add(t);
t.Start();
}
// threads are working. At some point you'll need to clean up:
foreach (var t in threads)
{
t.Join();
}
Your CopyFiles method looks like this:
void CopyFiles()
{
foreach (var path in filePaths.GetConsumingEnumerable())
{
CopyTheFile(path);
}
}
Since you're working with .NET 4.0, you could use Task instead of Thread. The code would be substantially similar.
Related
Recently, I was interviewed at a couple of companies, and was asked the same question:
"You've got N worker threads that can communicate only via shared memory, any other synchronization primitives are not available. The shared memory contains a counter which is initially 0, and each thread must increment it once. Another thread may be added, and there is more space on the shared memory in addition to the counter"
In other words, there are multiple threads, and their access to a shared resource (in this case, a counter, but can be anything else) must be synchronized using shared memory only.
So my solution was as follows:
Define 3 more integer variables on the shared memory: REQUEST, GRANTED, FINISHED, and initialize them to -1.
Before starting the worker threads, start another manager thread that will coordinate between the worker threads.
Manager thread pseudocode:
while (true) {
if(GRANTED equals FINISHED) {
GRANTED = REQUEST;
}
}
Worker thread pseudocode:
incremented = false;
while (incremented equals false) {
REQUEST = this thread ID;
if(GRANTED equals this thread ID) {
increment the counter;
incremented = true;
FINISHED = this thread ID;
}
}
The question is whether this solution is OK?
Are there other solutions?
Also, this solution is not fair, because a worker may try many times until it gets a chance to actually increment the counter. How to make it fair?
I am having a hard time trying to swallow a concept of multithreaded render in DX12.
According to MSDN one must write draw commands into direct command lists (preferably using bundles) and then submit those lists to a command queue.
It is also said that one can have more than one command queue for direct command lists. But it is unclear for me what is the purpose of doing so.
I take the full profit of multithreading by building command lists in parallel threads, don't i? If so, why would i want to have more than one command queue associated with the device?
I suspect that improper management of command queues can lead to enormous troubles with performance in later stages of rendering library development.
The main benefit to directx 12 is that execution of commands is almost purely asynchronous. Meaning when you call ID3D12CommandQueue::ExecuteCommandLists it will kick off work of the commands passed in. This brings another point however. A common misconception is that rendering is somehow multithreaded now, and this is just simply not true. All work is still executed on the GPU. However command list recording is what is done on several threads, as you will create a ID3D12GraphicsCommandList object for each thread needing it.
An example:
DrawObject DrawObjects[10];
ID3D12CommandQueue* GCommandQueue = ...
void RenderThread1()
{
ID3D12GraphicsCommandList* clForThread1 = ...
for (int i = 0; i < 5; i++)
clForThread1->RecordDraw(DrawObjects[i]);
}
void RenderThread2()
{
ID3D12GraphicsCommandList* clForThread2 = ...
for (int i = 5; i < 10; i++)
clForThread2->RecordDraw(DrawObjects[i]);
}
void ExecuteCommands()
{
ID3D12GraphicsCommandList* cl[2] = { clForThread1, clForThread2 };
GCommandQueue->ExecuteCommandLists(2, cl);
GCommandQueue->Signal(...)
}
This example is a very rough use case, but that is the general idea. That you can record objects of your scene on different threads to remove the CPU overhead of recording the commands.
Another useful thing however is that with this setup, you can kick off rendering tasks and start recording another.
An example
void Render()
{
ID3D12GraphicsCommandList* cl = ...
cl->DrawObjectsInTheScene(...);
CommandQueue->Execute(cl); // Just send it to the gpu to start rendering all the objects in the scene
// And since we have started the gpu work on rendering the scene, we can move to render our post processing while the scene is being rendered on the gpu
ID3D12GraphicsCommandList* cl2 = ...
cl2->SetBloomPipelineState(...);
cl2->SetResources(...);
cl2->DrawOnScreenQuad();
}
The advantage here over directx 11 or opengl is that those apis potentially just sit there and record and record, and possibly don't send their commands until Present() is called, which forces the cpu to wait, and incurring an overhead.
The following code will occupy ~410MB of memory and will not release it again. (The version using dispatch_sync instead of dispatch_async will require ~8MB memory)
I would expect a spike of high memory usage but it should go down again... Where is the leak?
int main(int argc, const char * argv[]) {
#autoreleasepool {
for (int i = 0; i < 100000; i++) {
dispatch_async(dispatch_get_global_queue(QOS_CLASS_UTILITY, 0), ^{
NSLog(#"test");
});
}
NSLog(#"Waiting.");
[[NSRunLoop mainRunLoop] runUntilDate:[NSDate dateWithTimeIntervalSinceNow:60]];
}
return 0;
}
I tried:
Adding #autoreleasepool around and inside the loop
Adding NSRunLoop run to the loop
I tried several combinations and never saw a decrease of memory (even after waiting minutes).
I'm aware of the GCD reference guide which contains the following statement:
Although GCD dispatch queues have their own autorelease pools, they make no guarantees as to when those pools are drained.
Is there a memory leak in this code? If not, is there a way to enforce the queue to release/drain the finished blocks?
Objective-C block it is a C structure, I think you create 100000 the block objects to execute them in background threads and them wait while system can run them. Your device can execute limited count of threads, it means that many blocks will wait before OS start them.
If you change "async" to "sync", a next block object will be created after a previous block will be finished and destroyed.
UPD
About GCD pool.
GCD executes tasks on GCD thread pool, threads are created by the system, and managed by system. System caches threads to save CPU time, every dispatch task executes on free thread.
From documentation:
——
Blocks submitted to dispatch queues are executed on a pool of threads fully managed by the system. No guarantee is made as to the thread on which a task executes.
——
If you run the tasks as synchronized tasks, then exist the free thread (from GCD thread pool) to execute next task, after current task’s finished (because main thread is waiting while task execute, and does not add new tasks to the queue), and system does not allocate new NSThread (On my mac I’ve seen 2 threads). If you run the tasks as async, then the system can allocate many NSThreads (to achieve of maximum performance, on my mac it is near 67 threads), because the global queue contain many tasks.
Here you can read about max count of GCD thread pool.
I’ve seen in Alocations profiler that there are many NSThreads allocated and not destructed. I think it is system pool, that will be freed if necessary.
Always put #autoreleasepool inside every GCD call and you will have no problems. I had the same problem and this is the only workaround.
int main(int argc, const char * argv[]) {
#autoreleasepool {
for (int i = 0; i < 100000; i++) {
dispatch_async(dispatch_get_global_queue(QOS_CLASS_UTILITY, 0), ^{
// everything INSIDE in an #autoreleasepool
#autoreleasepool {
NSLog(#"test");
}
});
}
NSLog(#"Waiting.");
[[NSRunLoop mainRunLoop] runUntilDate:[NSDate dateWithTimeIntervalSinceNow:60]];
}
return 0;
}
In my code I have a loop, inside this loop I send several requests to a remote webservice. WS providers said: "The webservice can host at most n threads", so i need to cap my code since I can't send n+1 threads.
If I've to send m threads I would that first n threads will be executed immediately and as soon one of these is completed a new thread (one of the remaining m-n threads) will be executed and so on, until all m threads are executed.
I have thinked of a Thread Pool and explicit setting of the max thread number to n. Is this enough?
For this I would avoid the use of multiple threads. Instead, wrapping the entire loop up which can be run on a single thread. However, if you do want to launch multiple threads using the/a thread pool then I would use the Semaphore class to facilitate the required thread limit; here's how...
A semaphore is like a mean night club bouncer, it has been provide a club capacity and is not allowed to exceed this limit. Once the club is full, no one else can enter... A queue builds up outside. Then as one person leaves another can enter (analogy thanks to J. Albahari).
A Semaphore with a value of one is equivalent to a Mutex or Lock except that the Semaphore has no owner so that it is thread ignorant. Any thread can call Release on a Semaphore whereas with a Mutex/Lock only the thread that obtained the Mutex/Lock can release it.
Now, for your case we are able to use Semaphores to limit concurrency and prevent too many threads from executing a particular piece of code at once. In the following example five threads try to enter a night club that only allows entry to three...
class BadAssClub
{
static SemaphoreSlim sem = new SemaphoreSlim(3);
static void Main()
{
for (int i = 1; i <= 5; i++)
new Thread(Enter).Start(i);
}
// Enfore only three threads running this method at once.
static void Enter(int i)
{
try
{
Console.WriteLine(i + " wants to enter.");
sem.Wait();
Console.WriteLine(i + " is in!");
Thread.Sleep(1000 * (int)i);
Console.WriteLine(i + " is leaving...");
}
finally
{
sem.Release();
}
}
}
I hope this helps.
Edit. You can also use the ThreadPool.SetMaxThreads Method. This method restricts the number of threads allowed to run in the thread pool. But it does this 'globally' for the thread pool itself. This means that if you are running SQL queries or other methods in libraries that you application uses then new threads will not be spun-up due to this blocking. This may not be relevant to you, in which case use the SetMaxThreads method. If you want to block for a particular method however, it is safer to use Semphores.
Heads up: I am not very familiar with working with threadpool, which might be obvious from the following code. I am under the impression that I could push many values into this queue and then it would wait for one thread to complete and then move onto the next and the system would handle the synchronization of how many threads to be running.
I am trying to use ThreadPool::QueueUserWorkItem(waitcallback, num) where the value of num is iterated up to a dynamic value depending on some prior algorithm. The problem I am coming across is the program crashes when it gets too high.
WaitCallback^ wcb = gcnew WaitCallBack(this, &createImage);
for(int i = 0; i < numBlocks; i++)
{
ThreadPool::QueueUserWorkItem(wcb, i);
}
I get the message "Runtime Error! This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information."
My most resent run through had numBlocks = 644.
It's hard to say what caused the program to crash. Most likely, an exception was thrown in one of the threads, and that brought the program down. You'll have to determine where in your code the exception was thrown.
As you know, ThreadPool::QueueUserWorkItem queues an item to be processed by the threadpool. But there can be multiple threads processing items from that queue. For example, you could have 20 pool threads, with 15 of them processing the work items that you queued.
If you really have that many items to process and you want them done one at a time, why not just queue one thread to do them one at a time. I've never done managed C++, so I won't try to write an example with it. But perhaps you can translate this C# code:
void ProcessInBackground(object state)
{
int numBlocks = (int)state;
for (int i = 0; i < numBlocks; ++i)
{
createImage(i);
}
}
And then you can call it with:
ThreadPool::QueueUserWorkItem(ProcessInBackground, numBlocks);
That creates a single thread that will process the items in order.
I suspect you can convert that to managed C++ fairly easily.