What is the best python based multinest package that optimizes for multi processing with concurrent.futures?
I've had issues getting multicast to use all of my CPUs with anything but multiprocessing.pool; but the python multinest operations seem to not be able to use that.
On the github issues section for dynesty (one of the two most common, pure-python MultiNest), we discussed this is as well
https://github.com/joshspeagle/dynesty/issues/100
There was not a very settled, final explanation, but the thought is that
(1) The cost function is not large enough to require all of the cores at once
(2) The bootstrap flag should be set to 0 to avoid bootstrapping; it's a trick implemented for speed that seems to be interfering.
I've used Nestle (github.com/kbarbary/nestle) and Dynesty (github.com/joshspeagle/dynesty); they both seem to have this problem no matter the complexity of the cost function.
I have had great success using PyMultiNest (github.com/JohannesBuchner/PyMultiNest); but it requires the fortran version of MultiNest (github.com/JohannesBuchner/MultiNest), which is very difficult to install correctly -- need to manually install OpenMPI. Both MultiNest and OpenMPI can have compiler issues depending on the OS, system, and configuration thereof.
I would suggest using PyMultiNest, except that it's so hard to install; Using Dynesty and Nestle are trivial; but they have had this issue with full parallelizations.
I will use HPC for my research and I don't know a lot about parallel or distributed computing.
I really don't understand the DistributedDataParallel() in pytorch. Especially init_process_group().
What is the meaning of initializing processes group? and what is
init_method : URL specifying how to initialize the package.
for example (I found those in the documentation):
'tcp://10.1.1.20:23456' or 'file:///mnt/nfs/sharedfile'
What are those URLs?
What is the Rank of the current process?
Is world_size the number of GPUs?
It would be really appreciated if someone explained to me what is and How to use DistributedDataParallel() and init_process_group() because I don't know parallel or distributed computing.
I will use things like Slurm(sbatch) in the HPC.
I have the "standard" version of Matlab without any additional toolboxes installed.
Is it somehow possible to make use of multithreading (use all cores of a quad-core instead of only one) without installing the Parallel Computing Toolbox?
I guess it is not, but maybe someone figured out a workaround?
Thank you very much!
There are several functions, that are implemented using multi-threading. If you use these functions, all cores will be used: http://www.mathworks.com/matlabcentral/answers/95958
You can use threads/parallelism in C, C++ or Java, all of which can be called from Matlab (Java being probably the fastest/simplest way?).
A couple of observations:
a) Matlab's parallel construct are quite heavyweight and will not give you a super-speedup. I personally prefer calling C/C++ code with OpenMP if I want fast-to-write parallelism.
b) Matlab's functions, in general, are not thread-safe, therefore calling them from multithreaded non-Matlab code is dangerous.
c) In image processing, some of the functions in Matlab are GPU-accelerated, therefore they are quite fast on their own.
I currently have a parallel for loop similar to this:
int testValues[16]={5,2,2,10,4,4,2,100,5,2,4,3,29,4,1,52};
parallel_for (1, 100, 1, [&](int i){
int var4;
int values[16]={-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1};
/* ...nested for loops */
for (var4=0; var4<16; var4++) {
if (values[var4] != testValues[var4]) break;
}
/* ...end nested loops */
}
I have optimised as much as I can to the point that the only thing more I can do is add more resources.
I am interested in utilising the GPU to help process the task in parallel. I have read that embarassingly parallel tasks like this can make use of a modern GPU quite effectively.
Using any language, what is the easiest way to use the GPU for a simple parallel for loop like this?
I know nothing about GPU architectures or native GPU code.
as Li-aung Yip said in comments, the simplest way to use a GPU is with something like Matlab that supports array operations and automatically (more or less) moves those to the GPU. but for that to work you need to rewrite your code as pure matrix-based operations.
otherwise, most GPU use still requires coding in CUDA or OpenCL (you would need to use OpenCL with an AMD card). even if you use a wrapper for your favourite language, the actual code that runs on the GPU is still usually written in OpenCL (which looks vaguely like C). and so this requires a fair amount of learning/effort. you can start by downloading OpenCL from AMD and reading through the docs...
both those options require learning new ideas, i suspect. what you really want, i think, is a high level, but still traditional-looking, language targeted at the gpu. unfortunately, they don't seem to exist much, yet. the only example i can think of is theano - you might try that. even there, you still need to learn python/numpy, and i am not sure how solid the theano implementation is, but it may be the least painful way forwards (in that it allows a "traditional" approach - using matrices is in many ways easier, but some people seem to find that very hard to grasp, conceptually).
ps it's not clear to me that a gpu will help your problem, btw.
You might want to check out array fire.
http://www.accelereyes.com/products/arrayfire
If you use openCL, you need to download separate implementations for different device vendors, intel, AMD, and Nvidia.
You might want to look into OpenACC which enables parallelism via directives. You can port your codes (C/C++/Fortran) to heterogeneous systems while maintaining a source code that still runs well on a homogeneous system. Take a look into this introduction video. OpenACC is not GPU programming, but expressing parallelism into your code, which may be helpful to achieve performance improvements without too much knowledge in low-level languages such as CUDA or OpenCL. OpenACC is available in commercial compilers from PGI, Cray, and CAPS (PGI offers new users a free 30 day trial).
Does anyone know if it's possible use OpenMP with OCaml source code?
Or another application/ambient of work, compatible with OCaml, that allows me to run parallel programs that exploit multiple cores?
If yes, how? Have you got an easy example?
Currently there is OC4MC (ocaml 4 multi-core) to perform shared memory multi-processing. I have not used the project, but there are fairly recent updates, so I can only assume the project is still moving forward.
JOCAML is another concurrent extension to ocaml implementing the join calculus. I have also not used this project, but their site is updated to mention ocaml 3.12, which came out fairly recently. Disregard; see comment.
If you can pry yourself away from the openMP paradigm, then there are ocaml bindings for mpi. I use this project, and have not had problems with it, and it's pretty easy to use if you are familiar with MPI.
Lastly, some (possibly unmaintained) packages pertaining to multi-core / parallel processing can be found on the ocaml hump.