When I run simply "matlab", maxNumCompThreads returns 4.
When I run "matlab -singleCompThread", maxNumCompThreads returns 1.
However in both instances, ps uH p <PID> | wc -l (which I picked up from another question on SO to determine the number of threads a process is using) returns 35.
What gives? Can somebody explain to me what the 35 represents, and whether or not I can trust maxNumCompThreads as indicating that Matlab is only using one thread?
The number of threads used by MATLAB for computation (maxNumCompThreads) is different from the number of threads MATLAB.exe uses to manage its internal functions: the interpreter, memory manager, command line, who knows what else. If you were writing MATLAB, imagine the number of threads required to manage the various ongoing, independent tasks. Perhaps have a look at the Octave or FreeMat code to get an idea.
Many of the threads you see are used by the JVM that MATLAB launches. You could try the flag "-nojvm" to cut things down further. Obviously, without the JVM, functionality is very limited. "-singleCompThread" limits only the threads used by numeric computation such as MATLAB's intrinsic multithreading as well as threads used by external libraries such as MKL and FFTW.
Related
When building gem5, the user can use the -j flag to set the number of threads to use.
In the documentation on building gem5, this note is included:
Note: I’m using -j9 here to execute the build on 9 of my 8 cores on my machine. You should choose an appropriate number for your machine, usually cores+1
Why is using cores+1 threads "appropriate?
I'm assuming that it has something to do with the fact that if one thread is blocked, there should be another thread to switch to. But at this point, why not have cores+2 number of threads (assuming that the payoff is greater than the cost of creating the threads?
In short,
Is there any sort of research on the "optimal" number of threads to use when running multi-threaded processes?
scons build/{ISA}/gem5.{variant} -j {cpus}
just set how many threads to use when you build your gem5, this is have nothing to do with running the simulations. The more threads you do, the more quicker your build will be.
For the optimal number of threads, you can run 2 threads for each processor to gain the optimal performance. However, that doesn't mean you cant run more than 2*NUM_CPU
Why is using cores+1 threads "appropriate?
Well ... the theory is that you have N cores then N+1 threads is going to keep all of your threads busy without wasting too much extra memory (etc) on threads that can't run because all of the cores are busy.
However, the "theory" (i.e. modelling) on which this based makes assumptions that are not necessarily realistic. Plug in different assumptions and you will get a different formula.
In practice ... cores+1 should be treated as just a "rule of thumb". The real optimal thread count for a particular build can only be determined experimentally on your build platform. And it will vary depending on what other things the build platform is doing at the same time. It is probably more effort to do that than it is worth to find the optimal setting.
My advice:
Don't over-think it.
If cores+1 gives you good results, use it. If not, try a larger or smaller number.
If someone tells you categorically that cores+1 or cores*2 or something else is the "optimal" thread count, they probably don't have a scientific basis for that. Treat what they say with healthy skepticism! They are entitled to their opinion ... but that is all that it is.
Is there any sort of research on the "optimal" number of threads to use when running multi-threaded processes?
The problem is too (mathematically) complicated to come up with a model that can give you good a priori predictions for a previously unseen multi-threaded application.
I am new to clojure and am performing multi-threading over a large dataset. I need to look at the number of threads running by the program so that I may control the processing of the file. Is there a way to get a graphical representation of the threads and sub-threads that a clojure program is running? If not a graphical representation at least the number of threads being run by the program ?
nevermind i used Thread/activeCount
I have been asked to write the test cases to show practically the performance of semaphore and read write semaphore in case of more readers and less writers and vice versa.
I have implemented the semaphore (in kernel space we were asked actually) but not getting how to write the use cases and do a live practical evaluation ( categorically ) of same.
Why don't you just write your two versions of the code (Semaphore / R/W Semaphore) to start. The use cases will depend on the actual feature being tested. Is it a device driver? Is it IO related at all? Is it networking related? It's hard to come up with use cases without knowing this.
Generally what I would do for something like an IO benchmark would be running multiple simulations over an increasing memory footprint for a set of runs. Another set of runs might be over an increasing process load. Another may be over different block sizes. I would compare each one of those against something like aggregate bandwidth and see how performance (aggregate bandwidth in this case) changed over those tests.
Again, your use cases might be completely different if you are testing something like a USB driver.
Using your custom semaphores, write the following 2 C programs and compile them
reader.c
writer.c
As a simple rudimentary test, write a shell script test.sh and add your commands to load the test binaries as follows.
#!/bin/sh
./reader &
./reader &
./reader &
./reader &
./writer &
Launching the above shell script as ./test.sh will launch 4 readers and 1 writer. Customise this to your test scenario.
Ensure that your programs are operating properly i.e. verify data is being exchanged properly first before trying to profile the performance.
Once you are sure that IPC is working as expected, profile the cpu usage. Prior to launching test.sh, run the top command in another terminal. Observe the cpu usage patterns for varying number of readers/writers during the run-time of test script.
Also you can launch the individual binaries(or in the test-script) with :
time <binary>
To print the total lifetime and time spent waiting on the kernel driver.
perf record <binary>
and after completion, run perf annotate main
To obtain the relative amount of time spent in various sections of the code.
I have a strange question. I have to calculate the number of
pthread_mutex in running system, for example, debian, ubuntu,system in
microcontroller and etc. I have to do it without LD_PRELOAD,
interrupting, overloading of functions and etc. I have to calculate it
in random time.
Do somebody have idea how I can do it? Can you see me way?
for the count the threads:
ps -eLf will give you a list of all the threads and processes currently running on the system.
However you ask for a list of all threads that HAVE executed on the system, presumably since some arbitrary point in the past - are you sure that is what you mean? You could run ps as a cron job and poll the system every X minutes, but you would miss threads that were born and died between jobs. You would also have a huge amount of data to deal with.
For count the mutex it's impossible
I'm writing a parallel Haskell program using Strategies. It's not doing what it's supposed to do, and I would like to inspect which Haskell Execution Context (HEC) a function is executed in.
Is there a getHEC call or something similar which I could use in my debug output?
You can find out which capability (i.e. CPU core) a Haskell thread is running on by calling threadCapability from Control.Concurrent.
If you're running your program with +RTS -N, there will be one OS-level thread (HEC) spawned per core, so the capability number returned by threadCapability will tell you which OS thread your forkIO green thread is running on. If, however, you are explicitly specifying the number of OS threads with +RTS -Nn, where n is some integer other than the number of cores on your system, this will probably be less useful to you.
You might also find ThreadScope to be useful for debugging and visualizing the execution of parallel programs.