openMP enviroment settings - windows-10

I have a perhaps simple question, but have not found an answer.
Question: Is it possible to set openMP enviroment settings after compiling? In the source code they are not set.
Question 2: Is there a difference if I use /libs:static or .dll?
I want to use openMP with the following hardware:
2 CPU's ( 2 socket)
each CPU has 12 cores and 24 threads
My solution is:
set OMP_NUM_THREADS=12
set OMP_PLACES="{0,1,2,3,4,5}{6,7,8,9,10,11}"
set OMP_PROC_BIND=spread
he performance is not good, both CPU's are not used. I use windows 10.
Thanks for a short answer.

Related

CPU/Threads usage on M1 Pro (Apple Silicon) using openMP

hope someone knows the answer to this...
I have a code that compiles perfectly well with openMP (it uses libsharp). However, I am finding it impossible to make the M1 Pro chip use all the 8 or 10 cores I have.
I am setting the threads variable correctly as export OMP_NUM_THREADS=10 such that the code correctly identifies it's supposed to be running with 10 threads (see image below showing a print-screen from my activity monitor):
Activity Monitor Print Screen
Print screen is showing that the code is compiled for Apple Silicon, uses 10 threads but not much of the CPU available.
Does anyone know how to properly compile/set the number of threads such that all the cores will be used?
This is trivial in x86 architectures.
Not really an answer, but long for a comment...
If both LLVM and GCC behave the same then it's not an OpenMP runtime issue. (And your monitor output shows that the correct number of threads have been created). I'm also not certain that it's really an Arm issue.
Are you comparing with an Apple x86 machine (so running the same operating system), or with a Linux x86 system?
The scheduling decisions of the two OSes are likely different, and (for instance) MacOS has no interface to bind threads to logicalCPUs.
As well as that, there's the issue of having some fast and some slow cores. That could mean that statically scheduled loops are inefficient.
I'm also confused by the fact that you arm to show multiple instances of your code running at the same time, so you are explicitly causing over-subscription of the logicalCPUs...

How do I isolate 3 cores of a quadcore from Linux and use them for Halcon, exclusively?

How do I isolate 3 cores of a quadcore from Linux and use them for Halcon, exclusively?
Here is what I've tried so far:
I configured Linux to only use core 0 of the quadcore CPU by boot option isolcpu=1,2,3
I started my multi-thread C++ program and let one thread configure Halcon with a few HSystem::SetSystem(). This is the halcon main thread. By default, the "thread_pool" option is set to "true" (but I also tried "false"). And, which is important, at first, this run-function of the halcon main-thread calls pthread_setaffinity(getpid(), sizeof(set), &set); for cpu_set_t set for which I added core 1, 2 and 3 with CPU_SET(index, &set).
Anyway, now a QR matrix code with "Maximum" mode should start several threads on core 1, 2 and 3. But it doesn't work. It only runs on core 1 with almost 90% CPU load, and core 2 and 3 stay at 0% CPU load (seen with top -H). This looks to me as if Halcon does miss an magic option to use all 3 cores.
Are you 100% sure this should run in parallel?
Could you try it with a different code type (ECC200). According to https://www.mvtec.com/products/halcon/documentation/release-notes-1911-0/ in the Speedup section we know for sure the ECC200 reader is parallelized internally by HALCON. If this reader runs in parallel on your system and the QR Code Reader doesn't i would assume the QR Code reader simply isnt parallelized by HALCON.

Will a 8 CPUs Cloud Machine run 8x faster than a 1 CPU CM without changes in the code?

I am a beginner and I have no clue, yet, about cloud computing nor multithreading nor multiprocessing.
I have a desktop PC with an i7 (4 cores) and I was wondering if a multi-CPUs cloud machine OR an 8+ cores machine would run ANY CODE faster than my PC without any changes in the code.
Does the machine handle the tasks distribution on the several CPUs (or the 8+ cores) by itself or is it required to adapt the code? (multithreading or multiprocessing)
For the sake of argument, let say I run a simple loop like below:
results = {}
for i in range(10**8):
results[i] = i**2
This takes about 67 sec on my PC (I was running something else at the same time so I'm not sure this is accurate but my timing is irrelevant anyway).
Would the exact same code be faster on a multi-CPUs machine or an 8+ cores machine compare to a single CPU 4cores machine?
If it is, in fact, required to make changes, I would appreciate any beginner links to learn about multiprocess or multithread.
Thank you for your help.
I'm no expert but I think it really depends on the platform that you're using to write and run your code. Some languages may support multi-threading/multi-processing natively and as such the code will run faster but others might not.
One thing is for certain you can't explicitly say that in %100 of the cases a machine with more cores/CPUs will run a given piece of code faster than a machine with lesser cores/CPUs.
Hope I helped clear things up.
Edit:
This medium post regarding multiprocessing\multithreading in python looks good - Multithreading vs Multiprocessing in Python 🐍
Python multiprocessing for dummies
A code will run faster only when it is written in parallel fashion. The snippet in the text of your question is not written parallel, so it won't run any faster.
When parallel program is being written, the programmer keeps in mind the target level of parallelization. A sequential program has parallelization level = 1. A program with N CPU-intensive threads would run most effectively on N processors (cores). A program with high parallelization level may execute slower on 2-4 core machine than sequential variant.

Cassandra and Java 9 - ThreadPriorityPolicy=42 is outside the allowed range

Very recently I installed JDK 9 and Apache Cassandra from the official site. But now when I start cassandra in foreground, I get this message:
apache-cassandra-3.11.1/bin$ ./cassandra -f
[0.000s][warning][gc] -Xloggc is deprecated. Will use -Xlog:gc:/home/mmatak/monero/apache-cassandra-3.11.1/logs/gc.log instead.
intx ThreadPriorityPolicy=42 is outside the allowed range [ 0 ... 1 ]
Improperly specified VM option 'ThreadPriorityPolicy=42'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
So far I didn't find any solution for this. Is it maybe possible that Java 9 and Cassandra are not yet compatible? Here is that problem mentioned as well - #CASSANDRA-13107
But I am not sure how to just "remove the flag"? Where is it possible to override or remove this flag?
I had exactly the same issue:
Can't start Cassandra (Single-Node Cluster on CentOS7)
If it is an option for you, using Java 8, instead of 9, is the simplest way to solve the issue.
Setting the following env variables solved the problem in MAC
export JAVA8_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home
#Martin Matak Just comment out that line in the conf/jvm.options file:
########################
# GENERAL JVM SETTINGS #
########################
# allows lowering thread priority without being root on linux - probably
# not necessary on Windows but doesn't harm anything.
# see http://tech.stolsvik.com/2010/01/linux-java-thread-priorities-workaround.html
**#-XX:ThreadPriorityPolicy=42**
Some background on -XX:ThreadPriorityPolicy.
These were the values, as documented in the source code.
0 : Normal.
VM chooses priorities that are appropriate for normal
applications. On Solaris NORM_PRIORITY and above are mapped
to normal native priority. Java priorities below
NORM_PRIORITY map to lower native priority values. On
Windows applications are allowed to use higher native
priorities. However, with ThreadPriorityPolicy=0, VM will
not use the highest possible native priority,
THREAD_PRIORITY_TIME_CRITICAL, as it may interfere with
system threads. On Linux thread priorities are ignored
because the OS does not support static priority in
SCHED_OTHER scheduling class which is the only choice for
non-root, non-realtime applications.
1 : Aggressive.
Java thread priorities map over to the entire range of
native thread priorities. Higher Java thread priorities map
to higher native thread priorities. This policy should be
used with care, as sometimes it can cause performance
degradation in the application and/or the entire system. On
Linux this policy requires root privilege.
In other words: The default Normal setting causes thread priorities to be ignored on Linux.
Now someone found a bug in the code, which disabled the "is root?" check for values other than 1, but would still try to set the thread priority for every value other than 0.
Unless running as root, it would only be possible to lower the thread priority. So although not perfect, this was quite an improvement, compared to not being able to control the priorities at all.
Starting with Java 9, command line arguments like this one started to get checked, and this hack stopped working.
Fwiw, on Java 11/Linux, I can set the parameter to 1 without being root, and setting thread priorities does have an effect. So something has changed in the meantime, and at least with recent JVMs, and this hack does not seem necessary any more.
Solution to your Question
Reason for this exception
Multiple JDK versions running,probably JDK9,JDK 10 is causing this exception.
Set the Path to Point JDK 8 Version only.
Currently cassandra 3.1 is desired to run greater than jdk 8 only.
Change in Cassandra-Conf file (/opt/apache-cassandra-3.11.2/conf/cassandra-env.sh)
4.If you want to use higher JDK Version, update the system path variables based on your OS.
Theres a jvm.options in your conf directory which sets it:
https://github.com/apache/cassandra/blob/12d4e2f189fb228250edc876963d0c74b5ab0d4f/conf/jvm.options#L96
Following from Jay's answer if you're on macOS and installed via Homebrew: the file is located at local/etc/cassandra/jvm.options.

Matlab 2011a Use all Cores Available on 64 bit Linux?

Hi I've looked online but I can't seem to find the answer whether I need to do anything to make matlab use all cores? From what I understand multi-threading has been supported since 2007. On my machine matlab only uses one core #100% and the rest hang at ~2%. I'm using a 64 bit Linux (Mint 12). On my other computer which has only 2 cores and is 32 bit Matlab seems to be utilizing both cores #100%. Not all of the time but in sufficient number of cases. On the 64 bit, 4 core PC this never happens.
Do I have to do anything in 64 bit to get Matlab to use all the cores whenever possible? I had to do some custom linking after install as Matlab wasn't finding the libraries (eg. libc.so.6) because it wasn't looking in the correct places.
By standard, since the latest release, you can use 12 cores using the Parallel Computing Toolbox. Without this toolbox, I guess you're out of luck. Any additional cores could be accessed by the MATLAB Distributed Computing Server, where you actually pay per number of worker threads.
To make matlab use your multiple cores you have to do
matlabpool open
And it of course works better if you actually have multithreaded code (like using the spmd function or parfor loops)
More info at the Matlab homepage
MATLAB has only one single thread for Computation.
That said, multiple threads would be created for certain functions which use the multithreaded features of the BLAS libraries that it uses underneath.
Thus, you would only be able to gain a 'multi threaded' advantage if you are calling functions which use these multi-threaded blas libraries.
This link has information on the list of functions which are multithreaded.
Now for the use of your cores, that would depend on your OS. I believe the OS would have to load balance your threads to be used on all cores. One CANNOT set affinities to threads from within MATLAB. One can however set worker MATLAB processes to have affinities to cores from within the Parallel Computing toolbox.
However, you could always try setting the affinity for the MATLAB process to all your processors manually by the details available at the following link for Linux
Windows users can simply right click on the process in the task manager and set affinity.
My understanding is that this is only a request to the OS and is not a hard binding rule that the OS must adhere to.

Resources