I have a ShortTask and a LongTask. I dont want threads that running LongTask affect execution speed of threads that running ShortTash. My PC has 8 threads, if I create 2 threadpools each with 4 max num of threads, will those threads become isolated from each other automatically?
QThreadPool* shortPool = new QThreadPool(this);
shortPool->setMaxThreadCount(QThread::idealThreadCount() / 2);
QThreadPool* longPool = new QThreadPool(this);
longPool->setMaxThreadCount(QThread::idealThreadCount() / 2);
Related
Let's say we have two vectors A=(ai) and B=(bi), each of size n and we have to compute a new vector C=(ci) as 𝑐𝑖 = √(𝑎𝑖 × 𝑏𝑖) for(i=1,...,n)
Main question: What would be the best way to compute the ci in parallel (using nested parallelism, i.e. using sync and spawn).
I think the below understanding is correct about the computation
for (i = 1 to n) {
C[i] = Math.sqrt(A[i] * B[i]);
}
And is there any way to use parallel for loops to compute C in parallel ?
If so, I think the approach will be the following:
parallel for (i = 1 to n) {
C[i] = Math.sqrt(A[i] * B[i]);
}
Is it correct ?
Assuming that by best you mean fastest, the usual approach would be to divide A and B into chunks, spawn a separate thread for handling each of these chunks in parallel, and wait for all the threads to finish their tasks.
The optimal number of chunks for such computation, most likely, will be the number of CPU cores you have on your computer. So, the pseudocode would look like:
chunkSize = ceiling(n / numberOfCPUs)
for (t = 1 to numberOfCPUs) {
startIndex = (t - 1) * chunkSize + 1
size = min(chunkSize, C.size - startIndex + 1)
threads.add(Thread.spawn(startIndex, size))
}
threads.join()
Where each thread, provided with the startIndex and size, computes:
for (i = startIndex to startIndex + size) {
C[i] = Math.sqrt(A[i] * B[i])
}
Another approach would be to have a pool of threads and give those threads a single shared queue of indices 1, 2, ... n. Each thread on each iteration polls the top index (let it be i) and calculates C[i]. As soon as the queue is empty, the work is done. The problem here is that you need additional synchronization mechanism that would guarantee that every index is processed by exactly one thread. For some simple tasks (like yours) such mechanism might consume more resources than actual calculation, but for relatively long-running tasks it works pretty well.
There's a mutual approach when you break the initial set of tasks into chunks, provide each thread in the pool with its own chunk, but when a thread is done with its chunk, it starts 'stealing' tasks from other threads in order not to sit idle. On many real tasks it gives better results than either of previous approaches.
I am trying to run some experiments to study the scaling properties of an Akka application that I have written. As a baseline I would like to force the application to run using only a single thread on a single core.
I am currently running the simulation on my quad-core laptop with the following in my application.conf file...
akka {
actor {
default-dispatcher {
fork-join-executor {
parallelism-min = 1
parallelism-factor = 0.25
parallelism-max = 1
}
}
}
}
Is this the correct best way to force my application to run as a single threaded application on one core? The idea is that once I have this baseline, I will then increase the number of available cores (and threads).
Yes that should work. I would just add that the declaration of parallelism-max would be already enough in your case. The parallelism-factor is just used in the following formula: available processors * factor. Akka first uses the formula to determine the number of threads that should be used. Next it makes sure you are within the min & max. So a number below 1 for the factor makes no sense. I think the best thing for you should be the following:
akka {
actor {
default-dispatcher {
fork-join-executor {
parallelism-max = X // set it to the number of cores you want to allow
parallelism-factor = 1
}
}
}
}
You can read more about it here.
I have a Dell Inspiron 620. The System control panel says 1 Intel i3-2100. Intel says (http://ark.intel.com/products/53422/) it has 2 cores and four threads. Three questions: In system environment variables the no_of_processors=4; so is that 4 threads?
Regarding the cores, when I run this code:
// get a list of all processor devices
deviceList = SetupDiGetClassDevs(ref processorGuid, "ACPI", IntPtr.Zero, (int)DIGCF.PRESENT);
// attempt to process each item in the list
for (int deviceNumber = 0; ; deviceNumber++)
{
SP_DEVINFO_DATA deviceInfo = new SP_DEVINFO_DATA();
deviceInfo.cbSize = Marshal.SizeOf(deviceInfo);
// attempt to read the device info from the list, if this fails, we're at the end of the list
if (!SetupDiEnumDeviceInfo(deviceList, deviceNumber, ref deviceInfo))
{
deviceCount = deviceNumber - 1;
break;
}
}
I get 3 as the number of cores, not 2.
Also, in terms of the number of threads this system will support adequately, is that cores x processors?
Thanks.
This is pure supposition but could it be:
2 Core with 4 threads implies hyper threading
System properties are referring 4 processors => its reading the number of hyper threads available as 'logical single thread processors'
Your loop starts at 0 so you're still seeing 4.
How can I limit number of threads that are being executed at the same time?
Here is sample of my algorithm:
for(i = 0; i < 100000; i++) {
Thread.start {
// Do some work
}
}
I would like to make sure that once number of threads in my application hits 100, algorithm will pause/wait until number of threads in the app goes below 100.
Currently "some work" takes some time to do and I end up with few thousands of threads in my app. Eventually it runs out of threads and "some work" crashes. I would like to fix it by limiting number of pools that it can use at one time.
Please let me know how to solve my issue.
I believe you are looking for a ThreadPoolExecutor in the Java Concurrency API. The idea here is that you can define a maximum number of threads in a pool and then instead of starting new Threads with a Runnable, just let the ThreadPoolExecutor take care of managing the upper limit for Threads.
Start here: http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html
import java.util.concurrent.*;
import java.util.*;
def queue = new ArrayBlockingQueue<Runnable>( 50000 )
def tPool = new ThreadPoolExecutor(5, 500, 20, TimeUnit.SECONDS, queue);
for(i = 0; i < 5000; i++) {
tPool.execute {
println "Blah"
}
}
Parameters for the ThreadBlockingQueue constructor: corePoolSize (5), this is the # of threads to create and to maintain if the system is idle, maxPoolSize (500) max number of threads to create, 3rd and 4th argument states that the pool should keep idle threads around for at least 20 seconds, and the queue argument is a blocking queue that stores queued tasks.
What you'll want to play around with is the queue sizes and also how to handle rejected tasks. If you need to execute 100k tasks, you'll either have to have a queue that can hold 100k tasks, or you'll have to have a strategy for handling a rejected tasks.
I have a queue of 1000 work items and a n-proc machine (assume n =
4).The main thread spawns n (=4) worker threads at a time ( 25 outer
iterations) and waits for all threads to complete before processing
the next n (=4) items until the entire queue is processed
for(i= 0 to queue.Length / numprocs)
for(j= 0 to numprocs)
{
CreateThread(WorkerThread,WorkItem)
}
WaitForMultipleObjects(threadHandle[])
The work done by each (worker) thread is not homogeneous.Therefore in
1 batch (of n) if thread 1 spends 1000 s doing work and rest of the 3
threads only 1 s , above design is inefficient,becaue after 1 sec
other 3 processors are idling. Besides there is no pooling - 1000
distinct threads are being created
How do I use the NT thread pool (I am not familiar enough- hence the
long winded question) and QueueUserWorkitem to achieve the above. The
following constraints should hold
The main thread requires that all worker items are processed before
it can proceed.So I would think that a waitall like construct above
is required
I want to create as many threads as processors (ie not 1000 threads
at a time)
Also I dont want to create 1000 distinct events, pass to the worker
thread, and wait on all events using the QueueUserWorkitem API or
otherwise
Exisitng code is in C++.Prefer C++ because I dont know c#
I suspect that the above is a very common pattern and was looking for
input from you folks.
I'm not a C++ programmer, so I'll give you some half-way pseudo code for it
tcount = 0
maxproc = 4
while queue_item = queue.get_next() # depends on implementation of queue
# may well be:
# for i=0; i<queue.length; i++
while tcount == maxproc
wait 0.1 seconds # or some other interval that isn't as cpu intensive
# as continously running the loop
tcount += 1 # must be atomic (reading the value and writing the new
# one must happen consecutively without interruption from
# other threads). I think ++tcount would handle that in cpp.
new thread(worker, queue_item)
function worker(item)
# ...do stuff with item here...
tcount -= 1 # must be atomic