Compile r with mkl (With mulithreads support) - multithreading

I compiled R by regarding these guides:
http://www.r-bloggers.com/compiling-64-bit-r-2-10-1-with-mkl-in-linux/
http://cran.r-project.org/doc/manuals/R-admin.html#MKL
But for matrix algebra R does not use all available CPUs.
I tried both:
MKL="-L${MKL_LIB_PATH} -lmkl_gf_lp64 -lmkl_gnu_thread \
-lmkl_core -fopenmp -lpthread"
and
MKL=" -L${MKL_LIB_PATH} \
-Wl,--start-group \
${MKL_LIB_PATH}/libmkl_gf_lp64.a \
${MKL_LIB_PATH}/libmkl_gnu_thread.a \
${MKL_LIB_PATH}/libmkl_core.a \
-Wl,--end-group \
-lgomp -lpthread"
Options.
How can I force R to use all available CPUs?
How can I check whether R use MKL or not?

I would like to add my procedure to compile R 3.0.1 with MKL libraries. I am using Debian 7.0 on a core i7 intel processor, 8G RAM. First i installed the MKL libraries, after i set MKL related environment variables (MKLROOT and LD_LIBRARY_PATH) with this command:
>source /opt/intel/mkl/bin/mklvars.sh intel64
So i used the following parameters to ./configure:
>./configure --enable-R-shlib --enable-threads=posix --with-lapack --with-blas="-fopenmp -m64 -I$MKLROOT/include -L$MKLROOT/lib/intel64 -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lpthread -lm"
and finished the installation with make and make install.
As a benchmark, i did a product between two 5000 x 5000 matrix product without MKL and got:
user system elapsed
57.455 0.104 29.033
and after compiling:
user system elapsed
15.993 0.176 4.333
a real gain!

All this is now a lot easier -- a short blog post is here discussing the steps below in detail.
But in short, all you need is this:
## get archive key
cd /tmp
wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
## add MKL to apt's repo list
sh -c 'echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list'
## update and install (500+ mb download, 1.9gb installed)
apt-get update
apt-get install intel-mkl-64bit-2018.2-046
## make it system default via update alternatives
update-alternatives --install /usr/lib/x86_64-linux-gnu/libblas.so libblas.so-x86_64-linux-gnu /opt/intel/mkl/lib/intel64/libmkl_rt.so 50
update-alternatives --install /usr/lib/x86_64-linux-gnu/libblas.so.3 libblas.so.3-x86_64-linux-gnu /opt/intel/mkl/lib/intel64/libmkl_rt.so 50
update-alternatives --install /usr/lib/x86_64-linux-gnu/liblapack.so liblapack.so-x86_64-linux-gnu /opt/intel/mkl/lib/intel64/libmkl_rt.so 50
update-alternatives --install /usr/lib/x86_64-linux-gnu/liblapack.so.3 liblapack.so.3-x86_64-linux-gnu /opt/intel/mkl/lib/intel64/libmkl_rt.so 50
## tell ldconfig
echo "/opt/intel/lib/intel64" > /etc/ld.so.conf.d/mkl.conf
echo "/opt/intel/mkl/lib/intel64" >> /etc/ld.so.conf.d/mkl.conf
ldconfig
That's it. Nothing else. Not recompiling or linking. And for example R now shows in sessionInfo() :
Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so

(not a real answer: I don't use MKL, I use OpenBlas as shared BLAS as described in the R-admin manual.)
As quick check whether the optimized BLAS is used I do a matrix multiplication. Even if only 1 core is used, this should be faster for the optimized BLAS than for the standard BLAS R comes with.
To check how many cores are in use, I look at top (or a CPU usage graph/monitor) during the matrix multiplication.
There has been trouble in the past with CPU affinity, so that a BLAS would start $n$ threads, but they were all running on the same core, see Parallel processing in R limited.
r-devel (3.0.0-to-be) has a function to set the CPU affinity.

A complete tutorial is available here:
https://software.intel.com/en-us/articles/build-r-301-with-intel-c-compiler-and-intel-mkl-on-linux
or simply using:
http://mran.revolutionanalytics.com/download/

Related

Thrust has no member device_malloc error when compiling gDel3D (3D Delaunay Triangulation with GPU)

I tried several different methods of building this library and all have resulted in "thrust" has no member "device_malloc".
The following is a link to the git repo for gDel3D: [https://github.com/ashwin/gDel3D][1]
The following is the error I receive when typing: make
make
[ 7%] Building NVCC (Device) object CMakeFiles/gflip3d.dir/GDelFlipping/src/gDel3D/GPU/gflip3d_generated_ThrustWrapper.cu.o
/home/gDel3D/GDelFlipping/src/gDel3D/GPU/ThrustWrapper.cu(121): error: namespace "thrust" has no member "device_malloc"
/home/gDel3D/GDelFlipping/src/gDel3D/GPU/ThrustWrapper.cu(121): error: type name is not allowed
/home/gDel3D/GDelFlipping/src/gDel3D/GPU/ThrustWrapper.cu(121): error: expression must have class type
/home/gDel3D/GDelFlipping/src/gDel3D/GPU/GPUDecl.h(280): error: namespace "thrust" has no member "device_malloc"
/home/gDel3D/GDelFlipping/src/gDel3D/GPU/GPUDecl.h(280): error: type name is not allowed
5 errors detected in the compilation of "/tmp/tmpxft_00000a74_00000000-7_ThrustWrapper.compute_30.cpp1.ii".
CMake Error at gflip3d_generated_ThrustWrapper.cu.o.Release.cmake:279 (message):
Error generating file
/home/gDel3D/build/CMakeFiles/gflip3d.dir/GDelFlipping/src/gDel3D/GPU/./gflip3d_generated_ThrustWrapper.cu.o
CMakeFiles/gflip3d.dir/build.make:84: recipe for target 'CMakeFiles/gflip3d.dir/GDelFlipping/src/gDel3D/GPU/gflip3d_generated_ThrustWrapper.cu.o' failed
make[2]: *** [CMakeFiles/gflip3d.dir/GDelFlipping/src/gDel3D/GPU/gflip3d_generated_ThrustWrapper.cu.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/gflip3d.dir/all' failed
make[1]: *** [CMakeFiles/gflip3d.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
I tried asking my question on the issues tab of the gDel3D forum, but thought it may be appropriate here as it seems to be a problem with configuring the thrust library. Thanks for the help!
Edit:
I tried the first suggestion below and have received more errors. I also tried switching OS. I am now running on Ubuntu 18.04
The following is the new error
user#user-Oryx-Pro:~/Documents/gFlip3D-Release_271/build$ cmake ..
-- Configuring done
-- Generating done
-- Build files have been written to: /home/user/Documents/gFlip3D-Release_271/build
user#user-Oryx-Pro:~/Documents/gFlip3D-Release_271/build$ make
[ 7%] Building NVCC (Device) object CMakeFiles/gflip3d.dir/GDelFlipping/src/gDel3D/gflip3d_generated_GpuDelaunay.cu.o
/home/user/Documents/gFlip3D-Release_271/GDelFlipping/src/gDel3D/GpuDelaunay.cu(839): error: namespace "thrust" has no member "gather"
1 error detected in the compilation of "/tmp/tmpxft_00002cd3_00000000-8_GpuDelaunay.compute_50.cpp1.ii".
CMake Error at gflip3d_generated_GpuDelaunay.cu.o.Release.cmake:279 (message):
Error generating file
/home/Documents/gFlip3D-Release_271/build/CMakeFiles/gflip3d.dir/GDelFlipping/src/gDel3D/./gflip3d_generated_GpuDelaunay.cu.o
CMakeFiles/gflip3d.dir/build.make:924: recipe for target 'CMakeFiles/gflip3d.dir/GDelFlipping/src/gDel3D/gflip3d_generated_GpuDelaunay.cu.o' failed
make[2]: *** [CMakeFiles/gflip3d.dir/GDelFlipping/src/gDel3D/gflip3d_generated_GpuDelaunay.cu.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/gflip3d.dir/all' failed
make[1]: *** [CMakeFiles/gflip3d.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
if it helps, i ran cmake -LA .
cmake -LA .
-- Configuring done
-- Generating done
-- Build files have been written to: /home/user/Documents/gFlip3D-Release_271/build
-- Cache values
CMAKE_AR:FILEPATH=/usr/bin/ar
CMAKE_BUILD_TYPE:STRING=
CMAKE_COLOR_MAKEFILE:BOOL=ON
CMAKE_CXX_COMPILER:FILEPATH=/usr/bin/c++
CMAKE_CXX_COMPILER_AR:FILEPATH=/usr/bin/gcc-ar-7
CMAKE_CXX_COMPILER_RANLIB:FILEPATH=/usr/bin/gcc-ranlib-7
CMAKE_CXX_FLAGS:STRING=
CMAKE_CXX_FLAGS_DEBUG:STRING=-g
CMAKE_CXX_FLAGS_MINSIZEREL:STRING=-Os -DNDEBUG
CMAKE_CXX_FLAGS_RELEASE:STRING=-O3 -DNDEBUG
CMAKE_CXX_FLAGS_RELWITHDEBINFO:STRING=-O2 -g -DNDEBUG
CMAKE_C_COMPILER:FILEPATH=/usr/bin/cc
CMAKE_C_COMPILER_AR:FILEPATH=/usr/bin/gcc-ar-7
CMAKE_C_COMPILER_RANLIB:FILEPATH=/usr/bin/gcc-ranlib-7
CMAKE_C_FLAGS:STRING=
CMAKE_C_FLAGS_DEBUG:STRING=-g
CMAKE_C_FLAGS_MINSIZEREL:STRING=-Os -DNDEBUG
CMAKE_C_FLAGS_RELEASE:STRING=-O3 -DNDEBUG
CMAKE_C_FLAGS_RELWITHDEBINFO:STRING=-O2 -g -DNDEBUG
CMAKE_EXE_LINKER_FLAGS:STRING=
CMAKE_EXE_LINKER_FLAGS_DEBUG:STRING=
CMAKE_EXE_LINKER_FLAGS_MINSIZEREL:STRING=
CMAKE_EXE_LINKER_FLAGS_RELEASE:STRING=
CMAKE_EXE_LINKER_FLAGS_RELWITHDEBINFO:STRING=
CMAKE_EXPORT_COMPILE_COMMANDS:BOOL=OFF
CMAKE_INSTALL_PREFIX:PATH=/usr/local
CMAKE_LINKER:FILEPATH=/usr/bin/ld
CMAKE_MAKE_PROGRAM:FILEPATH=/usr/bin/make
CMAKE_MODULE_LINKER_FLAGS:STRING=
CMAKE_MODULE_LINKER_FLAGS_DEBUG:STRING=
CMAKE_MODULE_LINKER_FLAGS_MINSIZEREL:STRING=
CMAKE_MODULE_LINKER_FLAGS_RELEASE:STRING=
CMAKE_MODULE_LINKER_FLAGS_RELWITHDEBINFO:STRING=
CMAKE_NM:FILEPATH=/usr/bin/nm
CMAKE_OBJCOPY:FILEPATH=/usr/bin/objcopy
CMAKE_OBJDUMP:FILEPATH=/usr/bin/objdump
CMAKE_RANLIB:FILEPATH=/usr/bin/ranlib
CMAKE_SHARED_LINKER_FLAGS:STRING=
CMAKE_SHARED_LINKER_FLAGS_DEBUG:STRING=
CMAKE_SHARED_LINKER_FLAGS_MINSIZEREL:STRING=
CMAKE_SHARED_LINKER_FLAGS_RELEASE:STRING=
CMAKE_SHARED_LINKER_FLAGS_RELWITHDEBINFO:STRING=
CMAKE_SKIP_INSTALL_RPATH:BOOL=NO
CMAKE_SKIP_RPATH:BOOL=NO
CMAKE_STATIC_LINKER_FLAGS:STRING=
CMAKE_STATIC_LINKER_FLAGS_DEBUG:STRING=
CMAKE_STATIC_LINKER_FLAGS_MINSIZEREL:STRING=
CMAKE_STATIC_LINKER_FLAGS_RELEASE:STRING=
CMAKE_STATIC_LINKER_FLAGS_RELWITHDEBINFO:STRING=
CMAKE_STRIP:FILEPATH=/usr/bin/strip
CMAKE_VERBOSE_MAKEFILE:BOOL=FALSE
CUDA_64_BIT_DEVICE_CODE:BOOL=ON
CUDA_ATTACH_VS_BUILD_RULE_TO_CUDA_FILE:BOOL=ON
CUDA_BUILD_CUBIN:BOOL=OFF
CUDA_BUILD_EMULATION:BOOL=OFF
CUDA_CUDART_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libcudart.so
CUDA_CUDA_LIBRARY:FILEPATH=/usr/lib/x86_64-linux-gnu/libcuda.so
CUDA_GENERATED_OUTPUT_DIR:PATH=
CUDA_HOST_COMPILATION_CPP:BOOL=ON
CUDA_HOST_COMPILER:FILEPATH=/usr/bin/cc
CUDA_NVCC_EXECUTABLE:FILEPATH=/usr/local/cuda/bin/nvcc
CUDA_NVCC_FLAGS:STRING=
CUDA_NVCC_FLAGS_DEBUG:STRING=
CUDA_NVCC_FLAGS_MINSIZEREL:STRING=
CUDA_NVCC_FLAGS_RELEASE:STRING=
CUDA_NVCC_FLAGS_RELWITHDEBINFO:STRING=
CUDA_PROPAGATE_HOST_FLAGS:BOOL=ON
CUDA_SDK_ROOT_DIR:PATH=CUDA_SDK_ROOT_DIR-NOTFOUND
CUDA_SEPARABLE_COMPILATION:BOOL=OFF
CUDA_TOOLKIT_INCLUDE:PATH=/usr/local/cuda/include
CUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda
CUDA_USE_STATIC_CUDA_RUNTIME:BOOL=ON
CUDA_VERBOSE_BUILD:BOOL=OFF
CUDA_VERSION:STRING=10.0
CUDA_cublas_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libcublas.so
CUDA_cublas_device_LIBRARY:FILEPATH=CUDA_cublas_device_LIBRARY-NOTFOUND
CUDA_cudadevrt_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libcudadevrt.a
CUDA_cudart_static_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libcudart_static.a
CUDA_cufft_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libcufft.so
CUDA_cupti_LIBRARY:FILEPATH=/usr/local/cuda/extras/CUPTI/lib64/libcupti.so
CUDA_curand_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libcurand.so
CUDA_cusolver_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libcusolver.so
CUDA_cusparse_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libcusparse.so
CUDA_nppc_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libnppc.so
CUDA_nppial_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libnppial.so
CUDA_nppicc_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libnppicc.so
CUDA_nppicom_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libnppicom.so
CUDA_nppidei_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libnppidei.so
CUDA_nppif_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libnppif.so
CUDA_nppig_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libnppig.so
CUDA_nppim_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libnppim.so
CUDA_nppist_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libnppist.so
CUDA_nppisu_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libnppisu.so
CUDA_nppitc_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libnppitc.so
CUDA_npps_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libnpps.so
CUDA_rt_LIBRARY:FILEPATH=/usr/lib/x86_64-linux-gnu/librt.so
I tried the zip suggested by #Snowie and modifying CmakeLists.txt file
set(CUDA_NVCC_FLAGS
${CUDA_NVCC_FLAGS};
-gencode arch=compute_35,code=sm_35
-gencode arch=compute_30,code=sm_30
-gencode arch=compute_50,code=sm_50)
Which also did not work.
Following Snowie's suggestion, I went ahead and added #include <thrust/gather.h> to GPUDelaunay. The build was successful but did not run.
I ran into a similar problem with GDel3D. A quick fix for this issue is to include necessary thrust files where needed. For device_malloc include <thrust/device_malloc.h>. Based on your error message it would seem you need to include it in gFlip3D-Release_271/GDelFlipping/src/gDel3D/GpuDelaunay.cu.
I also had better luck with a version of GDel3D you can get from this website:
https://www.comp.nus.edu.sg/~tants/gdel3d.html
more specifically from here:
https://www.comp.nus.edu.sg/~tants/gdel3d_files/gDel3D-Release_271.zip
Also you may want to try out different versions of CUDA Toolkit. And additionally you can modify the CMakeLists.txt file at the lines with "-gencode arch=compute_##,code=sm_##". That should allow you to specify different CUDA compute capabilities and such. I'm pretty new to CUDA myself so I might be wrong here.
I managed to run the version of GDel3D I linked to without any errors. It compiled and ran on GeForce GT 840m and CUDA 7.5.17 on Linux Mint (not sure of the version though exactly). Also I added includes for missing thrust functions and modified the CMakeLists.txt with "arch=compute_50,code=sm_50".
Hope any of this helps.
edit1:
Your new error indicates the fix worked, you just need to add all the thrust header files that are needed.
You have to read the error messages:
/home/user/Documents/gFlip3D-Release_271/GDelFlipping/src/gDel3D/GpuDelaunay.cu(839): error: namespace "thrust" has no member "gather"
this line tells you that you need to include header for gather function in gFlip3D-Release_271/GDelFlipping/src/gDel3D/GpuDelaunay.cu
Try googling for the proper header where this function is declared/defined if similar error comes up again. This time "#include <thrust/gather.h>" in gFlip3D-Release_271/GDelFlipping/src/gDel3D/GpuDelaunay.cu should fix the problem.
edit2:
Since you are using RTX 2070, you might want to try using this in your CMakelists.txt (make sure you are using CUDA Toolkit 10.0):
set(CUDA_NVCC_FLAGS
${CUDA_NVCC_FLAGS};
-gencode arch=compute_75,code=sm_75)
or
set(CUDA_NVCC_FLAGS
${CUDA_NVCC_FLAGS};
-D_FORCE_INLINES
-gencode arch=compute_75,code=sm_75)
This should make CUDA Toolkit use the most recent compute capability. Also you can try changing the ## parts in "compute_##" and "sm_##".
According to what I found on wiki your GPU is only supported by CUDA Toolkit versions 10.0–10.2 and these versions of the toolkit apparently support following compute capabilities: 3.0–7.5. So make sure you try these versions of CUDA Toolkit and these compute capabilities.
Before I go into the steps I took, I wanted to give a special thanks to #Snowie for their time and help in getting me to a solution.I started from a fresh install of ubuntu 18.04 with the following commands in the terminal. Some of these installs are not necessary but will save me time later.
sudo apt update
sudo ubuntu-drivers autoinstall
sudo apt-get install build essential
sudo apt-get install freeglut3 freeglut3-dev libxi-dev libxmu-dev
sudo apt install curl
sudo apt install software-properties-common
sudo apt install cmake
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt install gcc-6 g++-6 gcc-7 g++-7 gcc-8 g++-8 gcc-9 g++-9
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 90 --slave /usr/bin/g++ g++ /usr/bin/g++-9 --slave /usr/bin/gcov gcov /usr/bin/gcov-9
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 80 --slave /usr/bin/g++ g++ /usr/bin/g++-8 --slave /usr/bin/gcov gcov /usr/bin/gcov-8
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 70 --slave /usr/bin/g++ g++ /usr/bin/g++-7 --slave /usr/bin/gcov gcov /usr/bin/gcov-7
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-6 60 --slave /usr/bin/g++ g++ /usr/bin/g++-6 --slave /usr/bin/gcov gcov /usr/bin/gcov-6
Now I can choose the version of gcc that I want to run:
sudo update-alternatives --config gcc
There are 4 choices for the alternative gcc (providing /usr/bin/gcc).
Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/bin/gcc-9 90 auto mode
1 /usr/bin/gcc-6 60 manual mode
2 /usr/bin/gcc-7 70 manual mode
3 /usr/bin/gcc-8 80 manual mode
4 /usr/bin/gcc-9 90 manual mode
Press <enter> to keep the current choice[*], or type selection number:
In this case I choose 1 to select gcc-6
Now its time to install the cuda-toolkit
sudo apt install nvidia-cuda-toolkit
You should see output from nvcc --version similiar to
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA
Corporation Built on Fri_Nov__3_21:07:56_CDT_2017 Cuda compilation
tools, release 9.1, V9.1.85
Now download 3dDelaunay from here: “https://www.comp.nus.edu.sg/~tants/gdel3d_files/gDel3D-Release_271.zip”
Inside /home/user/Documents/gFlip3D-Release_271/GDelFlipping/src/gDel3D/GpuDelaunay.cu you need to add “#include <thrust/gather.h>”
Here is an example of what it looks like:
#include "GpuDelaunay.h"
#include<iomanip>
#include<iostream>
#include "GPU/CudaWrapper.h"
#include "GPU/HostToKernel.h"
#include "GPU/KerCommon.h"
#include "GPU/KerPredicates.h"
#include "GPU/KerDivision.h"
#include "GPU/ThrustWrapper.h"
#include <thrust/gather.h>
Now you need to set your target compute architecture. My video card is a RTX 2070. Accord to nvidia my compute architecture and compute capability are 7.5. I tried these numbers and it did not work. I thought maybe since the cuda version I ended up with was not 10.2 and was instead 10.1, that I needed to target a little less on my architecture. I targetted 7.0 compute capability and compute architecture instead.
Here is an example in the CMakeLists.txt file.
set(CUDA_NVCC_FLAGS
${CUDA_NVCC_FLAGS};
-gencode arch=compute_70,code=sm_70)
Now lets go ahead and create the build folder from the gFlip3D-Release_271 folder
mkdir build
cd build
cmake ..
make
And last run ./gflip3d
Generating input... Constructing 3D Delaunay triangulation... Checking... V: 100001 E: 772552 F: 1345102 T: 672551 Euler: 0 Euler
check: Pass Orient check: Pass Adjacency check: Pass
Convex hull facets: 412
Delaunay check: Pass
---- SUMMARY ----
PointNum 100000 FP Mode Double
TotalTime (ms) 101.19 InitTime 139861894.77 SplitTime
5.20 FlipTime 68.93 RelocateTime 8.11 SortTime 1.05 OutTime 9.87 SplayingTime 3.65
# Flips 1422073
# Failed verts 109
# Final stars 243
Edit* I have found a possibly easier way to fix this issue with device_malloc than what i have listed above. In order to correct the missing member device_malloc add a reference to the file in the two files where it is used.
The two files you need to modify with the new reference
/home/gFlip3D-Release_271/GDelFlipping/src/gDel3D/GPU/ThrustWrapper.cu
/home/gFlip3D-Release_271/GDelFlipping/src/gDel3D/GPU/GPUDecl.h
The magical line to add
#include <thrust/device_malloc.h>
I hope this helps others with their research leveraging the gDel3D!

Caffe multi CPU build

I'm trying to build Caffe on Ubuntu 14.04 x64 in VirtualBox with openblas in CPU_ONLY mode.(Enviroment install script , Makefile.config )
Also I'm not compiling OpenBlas, but install it via apt-get like sudo apt-get -y install libopenblas-dev, can it be reason of the problem?
After I set any of this variables, there is no speed improvement and in htop I see only one CPU utilisation.
export OPENBLAS_NUM_THREADS=4
export GOTO_NUM_THREADS=4
export OMP_NUM_THREADS=4
How to check if Caffe use several threads / CPUs?
UPDATE:
I tried caffe binary on MNIST example and it utilise 400% of CPU.
1 thread
I0520 15:58:09.749832 12424 caffe.cpp:178] Use CPU.
...
I0520 16:06:14.553506 12424 caffe.cpp:222] Optimization Done.
~8 min
4 threads
I0520 16:06:44.634735 12446 caffe.cpp:178] Use CPU.
...
I0520 16:13:15.904394 12446 caffe.cpp:222] Optimization Done.
~6.5 min
ps -T -p <PID> gives me:
export OPENBLAS_NUM_THREADS=1
6 threads
export OPENBLAS_NUM_THREADS=4
9 threads
Seems openblas works, but it depends on network architecture?
Also seems Caffe also use BLAS for conv layers.
I'm using the intel branch on CentOS 7.3 and able to see 2500% CPU usage on my Broadwell when training caffenet example. The following is the steps I used to build the tools:
git clone https://github.com/BVLC/caffe.git caffe_intel
cd caffe_intel
git branch -r
git checkout intel
cp Makefile.config.example Makefile.config
# mkl will be automatically download once make is run
# Edit Makefile.config in the following lines
PYTHON_LIB := /usr/lib64
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib64 /usr/lib
yum install cmake # will be used to build mkl as a sub-routine of make
make -j32 all
make pycaffe
make distribute
cd distribute/lib
ln -s ../../external/mkldnn/install/lib/libmkldnn.so .
And put caffe_intel/distribute/bin in $PATH, caffe_intel/distribute/lib in $LD_LIBRARY_PATH. Also, enable mkl library by adding the following line at the beginning of prototxt file(s)
engine: "MKL2017"

Remove all previous version MPI and reinstall correctly it

First of all: I'm on linux mint 17.3 x64
What I've done so far:
Guide to install Open MPI 1.8
Guide to install MPI
Attemp to remove MPI executing: sudo apt-get install libcr-dev mpich2 mpich2-doc (Actually the should be not installed)
What I can see from terminal:
output of: echo $PATH
/path/to/mpj//bin:/home/timmy/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/home/timmy/.openmpi/bin
(I immagine that I've to remove /path/to/mpj/ (not exists) and /home/timmy/.openmpi/bin (I want to remove previous version of ompi))
output of: echo $LD_LIBRARY_PATH
(nothing)
Really, doesn't appear anything!
output of mpirun
--------------------------------------------------------------------------
mpirun could not find anything to do.
It is possible that you forgot to specify how many processes to run
via the "-np" argument.
--------------------------------------------------------------------------
Why I want to remove Open MPI and reinstall it
I've a project to do using both MPI and OpenMP and with the actual installation of MPI I cannot compile using the following command: mpicc -openmp "test_omp.c" -o "test_omp". It gives me the following error: Not defined function omp_get_thread_num(); and moreover, it ignore my #pragma commands.
Your problem is that you are giving the compiler the wrong option to enable the OpenMP support. -openmp is only understood by the (commercial) Intel compiler, which is probably the tool-set installed on the site you've referred to in your other question. Most Linux distributions come with GCC and one is to assume that mpicc will use GCC (check with mpicc -showme).
The option to enable OpenMP support in GCC is -fopenmp (notice the f).

Prerequisites for gcc-4.7.2: ppl-0.11 fails to find gmp-4.3.2

I'm trying to build prerequisites for gcc-4.7.2.
Both ppl-0.11 and gmp-4.3.2 are the recommended versions in <gcc_src>/gcc-4.7.2/gcc/doc/HTML/prerequisites.html
I have built and installed gmp-4.3.2 (with --enable-cxx set)
Attempting to configure ppl-0.11 fails.
configure: error: Cannot find GMP version 4.1.3 or higher.
GMP is the GNU Multi-Precision library:
see http://www.swox.com/gmp/ for more information.
When compiling the GMP library, do not forget to enable the C++ interface:
add --enable-cxx to the configuration options.
This is my configure line:
./configure \
--prefix=$PREFIX \
--with-gmp=$PREFIX \
--with-gmp-prefix=$PREFIX \
If I look in the directory where I specified with-gmp, here is the installed gmp:
$ grep MP_VERSION $PREFIX/include/gmp*
$PREFIX/include/gmp.h:#define __GNU_MP_VERSION 4
$PREFIX/include/gmp.h:#define __GNU_MP_VERSION_MINOR 3
$PREFIX/include/gmp.h:#define __GNU_MP_VERSION_PATCHLEVEL 2
.
$ l $PREFIX/include/gmp*
$PREFIX/include/gmp.h
$PREFIX/include/gmpxx.h
.
$ l /$PREFIX/lib/libgmp*
$PREFIX/lib/libgmp.a
$PREFIX/lib/libgmp.la
$PREFIX/lib/libgmp.so -> libgmp.so.3.5.2
$PREFIX/lib/libgmp.so.3 -> libgmp.so.3.5.2
$PREFIX/lib/libgmp.so.3.5.2
$PREFIX/lib/libgmpxx.a
$PREFIX/lib/libgmpxx.la
$PREFIX/lib/libgmpxx.so -> libgmpxx.so.4.1.2
$PREFIX/lib/libgmpxx.so.4 -> libgmpxx.so.4.1.2
$PREFIX/lib/libgmpxx.so.4.1.2
Am I missing something?
As far as I can tell, GMP is available and of the requisite version
Depending on what distro you are running, have you tried to install the gmp-devel package (i.e. yum install gmp-devel on Fedora/RedHat etc)?
PPL will by default try to use default locations for GMP. If you use crosstool-ng, you must do either a cross-native or canadian-cross build. If you are doing this manually, specify CXXFLAGS to PPL's ./configure, with a -I<path-to-gmp-header> and a -Wl,-L<path-to-gmp-libs>. This allows the PPL ./configure to find the correct version of GMP.
Apparently a PPL configure with,
--prefix=$PREFIX \
--with-gmp=$PREFIX \
--with-gmp-prefix=$PREFIX \
Is not enough. I sleuthed through the ./configure script and was hacking up crosstool-ng before I realized that I was no longer building a cross-compiler, but a canadian-cross when I wasn't using my distro gcc, but another host compiler with a lower glibc shared library. This is useful if you want your compiler to run on a larger class of machines. It is unlikely that the glibc version of the build compiler will effect much.
I still had to patch 120-ppl.sh in crosstool-ng,
do_ppl_for_build() {
...
ppl_cxxflags="${CT_CFLAGS_FOR_BUILD}"
+ ppl_cxxflags+=" -I${CT_BUILDTOOLS_PREFIX_DIR}/include "
+ ppl_cxxflags+=" -Wl,-L${CT_BUILDTOOLS_PREFIX_DIR}/lib "
if [ "${CT_PPL_NEEDS_FPERMISSIVE}" = "y" ]; then
ppl_cxxflags+=" -fpermissive"
fi
So I also faced the same issue and what I did was:
1) Went inside gmp-4.3.2 folder
2) make distclean
3) ./configure --prefix=/home/sen/Documents/mingw/downloads/gmp_build --enable-cxx
4) make && make install
5) Went inside ppl-0.11 folder
6) ./configure --prefix=/home/sen/Documents/mingw/downloads/ppl_build --with-gmp-prefix=/home/sen/Documents/mingw/downloads/gmp_build --enable-cxx
7) make & make install
Took some 10-20 mins to compile and things were fine.
Thanks,
Sen
After years, the issue has been run into. The solution is firstly to download last version of gmp. Then, copy the path as in like the picture. Don't forget to ./configure with --enable-cxx, which is really important point. ./configure --enable-cxx. Now time is to ppl installation, ./configure -help indicates that --with-gmp=DIR search for libgmp/libgmpxx in DIR/include and DIR/lib. So write ./configure --with-gmp=<<dir of gmp as shown in first picture, you may have a different path>>
I wrote, respectively, ./configure --with-gmp=/usr/local/include, make, sudo make install then it works like a charm!

how to make a soft link for gcc/g++ 4.5

This is part of some instructions that I was given from a website helping me install CUDA on a hybrid system. I'm using ubuntu 12.04 LTS dual booted as well as having a hybrid graphics card system of Intel Integrated Graphics and NVIDIA GEForce GT 540M.
--external instructions--
The last thing that might cause issues is the version of gcc and g++. Long story short, make sure the pointers gcc and g++ in /usr/bin (and subsequently /usr/local/cuda/bin) are pointing to gcc-4.5 and g++-4.5 (can get these with apt-get) since they are the most recent versions supported by nvcc. Use the soft-link command to achieve this.
--back to me--
Assuming that downloading them with
apt-get install gcc-4.5 g++-4.5
will suffice for that part.
However, how do I make sure that the 'pointers' (how do I identify those?) are linked to the recently downloaded versions. I know the soft link command is
ln -s "target" "symbol" (one for gcc)
ln -s "target" "symbol" (one for g++)
I don't want to do this wrong and I'm quite new to linux so please help me with what 'target' should look like as well as 'symbol' and I'll be on my way.
Alex
It's better to use update alternatives for managing default gcc for your system. For example, you have two versions 4.4 and 4.5. For CUDA you need 4.4.x version of gcc/ Lets set it system default:
sudo update-alternatives \
--install /usr/bin/gcc gcc /usr/bin/gcc-4.5 40 \
--slave /usr/bin/g++ g++ /usr/bin/g++-4.5
sudo update-alternatives \
--install /usr/bin/gcc gcc /usr/bin/gcc-4.4 60 \
--slave /usr/bin/g++ g++ /usr/bin/g++-4.4
Soft links might be work, but I think, update alternatives is the easiest way.

Resources