Caffe multi CPU build - multithreading

I'm trying to build Caffe on Ubuntu 14.04 x64 in VirtualBox with openblas in CPU_ONLY mode.(Enviroment install script , Makefile.config )
Also I'm not compiling OpenBlas, but install it via apt-get like sudo apt-get -y install libopenblas-dev, can it be reason of the problem?
After I set any of this variables, there is no speed improvement and in htop I see only one CPU utilisation.
export OPENBLAS_NUM_THREADS=4
export GOTO_NUM_THREADS=4
export OMP_NUM_THREADS=4
How to check if Caffe use several threads / CPUs?
UPDATE:
I tried caffe binary on MNIST example and it utilise 400% of CPU.
1 thread
I0520 15:58:09.749832 12424 caffe.cpp:178] Use CPU.
...
I0520 16:06:14.553506 12424 caffe.cpp:222] Optimization Done.
~8 min
4 threads
I0520 16:06:44.634735 12446 caffe.cpp:178] Use CPU.
...
I0520 16:13:15.904394 12446 caffe.cpp:222] Optimization Done.
~6.5 min
ps -T -p <PID> gives me:
export OPENBLAS_NUM_THREADS=1
6 threads
export OPENBLAS_NUM_THREADS=4
9 threads
Seems openblas works, but it depends on network architecture?
Also seems Caffe also use BLAS for conv layers.

I'm using the intel branch on CentOS 7.3 and able to see 2500% CPU usage on my Broadwell when training caffenet example. The following is the steps I used to build the tools:
git clone https://github.com/BVLC/caffe.git caffe_intel
cd caffe_intel
git branch -r
git checkout intel
cp Makefile.config.example Makefile.config
# mkl will be automatically download once make is run
# Edit Makefile.config in the following lines
PYTHON_LIB := /usr/lib64
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib64 /usr/lib
yum install cmake # will be used to build mkl as a sub-routine of make
make -j32 all
make pycaffe
make distribute
cd distribute/lib
ln -s ../../external/mkldnn/install/lib/libmkldnn.so .
And put caffe_intel/distribute/bin in $PATH, caffe_intel/distribute/lib in $LD_LIBRARY_PATH. Also, enable mkl library by adding the following line at the beginning of prototxt file(s)
engine: "MKL2017"

Related

Is there any method to run perf under WSL?

When I wanted to run perf under WSL, I met the follow question:
WARNING: perf not found for kernel 4.4.0-18362
You may need to install the following packages for this specific kernel:
linux-tools-4.4.0-18362-Microsoft
linux-cloud-tools-4.4.0-18362-Microsoft
You may also want to install one of the following packages to keep up to date:
linux-tools-Microsoft
linux-cloud-tools-Microsoft
But I can't find packages called linux-tools-4.4.0-18362-Microsoft or linux-cloud-tools-4.4.0-18362-Microsoft. I guess the package names are generated automatically.
I also tried to use perf in docker container. However, docker container use the same kernel as the hosts.
Is there any method to run perf under WSL?
I heard that perf can be used in WSL2. But after I upgraded to WSL2, it shows the similar error message:
WARNING: perf not found for kernel 4.19.84-microsoft
You may need to install the following packages for this specific
kernel:
linux-tools-4.19.84-microsoft-standard
linux-cloud-tools-4.19.84-microsoft-standard
You may also want to install one of the following packages to keep
up to date:
linux-tools-standard
linux-cloud-tools-standard
WARNING: perf not found for kernel 4.19.84-microsoft
Because WSL2 uses custom Linux kernel. Its source code can be found here
microsoft/WSL2-Linux-Kernel. We have to compile perf tools from it.
Procedure
Install required build packages. If you are using Ubuntu in WSL2 this is the
required command:
sudo apt install build-essential flex bison libssl-dev libelf-dev
Clone the WSL2 Linux kernel repository:
git clone --depth=1 https://github.com/microsoft/WSL2-Linux-Kernel.git
Go to perf folder and compile it:
cd WSL2-Linux-Kernel/tools/perf
make
perf executable file will be in that folder.
You can install linux-tools-generic.
apt install linux-tools-generic
Then run perf using the install path /usr/lib/linux-tools/<linux-version>-generic/perf.
Some tools, like flamegraph, will use environment variable PERF as the perf path.
PERF=/usr/lib/linux-tools/<linux-version>-generic/perf flamegraph -- my_program
The accepted answer works. However, some features are missing.
In order to get useful and demangled information, I had to install the following libs and then run make again.
libbabeltrace-dev
libunwind-dev
libdw-dev
binutils-dev
libiberty-dev
I'm not sure if all of them are necessary. However, those are adequate for cargo-flamegraph (my usecase) to work.
You could install some generic version of perf, rather than the WSL2 version, like:
sudo apt install linux-tools-5.4.0-126-generic linux-tools-common
And then when you run perf, it will error out, like:
$ perf
WARNING: perf not found for kernel 5.10.16.3-microsoft
You may need to install the following packages for this specific kernel:
linux-tools-5.10.16.3-microsoft-standard-WSL2
linux-cloud-tools-5.10.16.3-microsoft-standard-WSL2
This is because script /usr/bin/perf always trying to get the perf binary from uname -r
$ grep uname `which perf`
full_version=`uname -r`
We could replace /usr/bin/perf with the actual perf :
mv /usr/bin/perf /usr/bin/perf.bk && ln -s /usr/lib/linux-tools/5.4.0-126-generic/perf /usr/bin/perf
and then:
$ perf stat ls 1>/dev/null
Performance counter stats for 'ls':
1.79 msec task-clock:u # 0.827 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
112 page-faults:u # 0.063 M/sec
<not supported> cycles:u
<not supported> instructions:u
<not supported> branches:u
<not supported> branch-misses:u
0.002158900 seconds time elapsed
0.002182000 seconds user
0.000000000 seconds sys
I think it is expected the hardware/cache counters are not available on WSL2
If you follow the accepted answer, make sure you read the complains the make command prints at the start, as it might be missing some headers and disables functionality.
For me it disabled tui, gtk and demangling to name a few features.

How to enable CONFIG_RT_GROUP_SCHED in Ubuntu to make it RT

I need to run real time applications on Ubuntu RT Linux and was reading about ways to make linux act as RT system and I learned two ways to do it
preemptive_rt kernel patching
enabling CONFIG_RT_GROUP_SCHED flag in the kernel.
I've already tried my hands on 1st method Install RT Linux patch for Ubuntu
However, apart from uname -r showing #1 SMP PREEMPT RT I've no other proof that it is actually a RT system and hence want to try the 2nd method. Enable CONFIG_RT_GROUP_SCHED flag in the kernel and see its performance.
I read we can confirm if the kernel already has the flag by following command:
# zcat /proc/config.gz | grep RT_GROUP
CONFIG_RT_GROUP_SCHED=y
However, my system doesn't even have the config.gz file in proc, so I believe my kernel does not have this enabled.
I'm relatively new to linux kernels so this might be naive but how can I enable this in the kernel?
Step 1
Download linux kernel from https://www.kernel.org/pub/linux/kernel/. For the purpose of this PoC we downloaded linux-4.16.18.tar.gz kernel from above link.
Step 2
Unzip the kernel
$ tar -xzvf linux-4.16.18.tar.gz
Step 3
Move to kernel source directory
$ cd linux-4.16.18
Step 4
Install kernel build dependencies
$ sudo apt install git build-essential kernel-package fakeroot libncurses5-dev libssl-dev ccache bison flex
Step 5
Run kernel configuration
$make menuconfig
Step 6
Go to General setup ─> Control Group Support ─> CPU controller ─> Group scheduling for SCHED_RR/FIFO configuration as shown below:
Go to General setup ─> Kernel .config support and enable access to .config through /proc/config.gz
Step 7
Compile the kernel
$ make -j20
Make modules & install
$ sudo make modules_install -j20
$ sudo make install -j20
Step 8
Open the grub.cfg file to verify if kernel is installed
$ vim /boot/grub/grub.cfg
Look for the menuentry with menuentry 'Ubuntu, with Linux linux-4.16.18'
If it's not your default kernel then change the GRUB_DEFAULT=0 value to your kernel
Step 9
Reboot your system
sudo reboot
Step 10
Verify the system by the following command:
# zcat /proc/config.gz | grep RT_GROUP
CONFIG_RT_GROUP_SCHED=y

How do I Distribute my Haxe application with Hashlink?

I've got a Haxe Application that I want to make available to people with a Windows system. I use Hashlink to run the Application locally and it works very nicely.
I am wondering if I'm supposed to distribute my Application with Hashlink. Can it build me an .exe?
It looks like generating distributable binary files isn't supported out of the box today (March 10, 2017):
> haxe -main Main -hl main.c
Code generated in main.c automatic native compilation not yet implemented
Hopefully it will be supported soon!
Note: I'm talking about building a final executable using hashlink. An entirely separate approach I do not cover here is the possibility of delivering the hashlink virtual machine with your output hl bitcode.
Sane people stop reading here.
But in the meantime... it is possible to generate binaries with hashlink today if you build hashlink from source.
Warnings:
This isn't a generic, cross-platform answer to your question -- it's just my experience on Linux.
There will probably soon be a better way than this.
But I wanted to jot these notes down even for myself to recall later.
Here's what I had to do on Ubuntu 14.04, 64-bit:
Install prerequisite libraries for building hl (there may be others I already have installed, like build-essential, etc)
sudo apt-get install libvorbis-dev libturbojpeg libsdl2-dev libopenal-dev libssl-dev
Clone and build the mbedtls library: (rev note: b5ba28)
cd ~/dev/
git clone https://github.com/ARMmbed/mbedtls.git
cd mbedtls
make CFLAGS='-fPIC'
Clone the hashlink repo: (rev note: eaa92b)
cd ~/dev/
git clone https://github.com/HaxeFoundation/hashlink.git
cd hashlink
In the # Linux section of the Makefile, ~line 67, add these flags:
CFLAGS += -I ../mbedtls/include
LIBFLAGS += -L../mbedtls/library
Now build with make
If everything works, you'll see two important output files, hl and libhl.so
Ok, at this point, it's easiest if you just build your project in the hashlink directory. For example:
# Still in the hashlink directory
haxe -cp /path/to/my/project -debug -main Main.hx -hl src/_main.c
Now run make hlc, and if everything works, hlc is the output executable (which depends on libhl.so):
cp libhl.so hlc /tmp/
cd /tmp/
./hlc
Prints:
Main.hx:7: Hello world!

Cross-compile a Rust application from Linux to Windows

Basically I'm trying to compile the simplest code to Windows while I am developing on Linux.
fn main() {
println!("Hello, and bye.")
}
I found these commands by searching the internet:
rustc --target=i686-w64-mingw32-gcc main.rs
rustc --target=i686_pc_windows_gnu -C linker=i686-w64-mingw32-gcc main.rs
Sadly, none of them work. It gives me an error about the std crate missing
$ rustc --target=i686_pc_windows_gnu -C linker=i686-w64-mingw32-gcc main.rs
main.rs:1:1: 1:1 error: can't find crate for `std`
main.rs:1 fn main() {
^
error: aborting due to previous error
Is there a way to compile code on Linux that will run on Windows?
Other answers, while technically correct, are more difficult than they need to be. There's no need to use rustc (in fact it's discouraged, just use cargo), you only need rustup, cargo and your distribution's mingw-w64.
Add the target (you can also change this for whatever target you're cross compiling for):
rustup target add x86_64-pc-windows-gnu
You can build your crate easily with:
cargo build --target x86_64-pc-windows-gnu
No need for messing around with ~/.cargo/config or anything else.
EDIT: Just wanted to add that while you can use the above it can also sometimes be a headache. I wanted to add that the rust tools team also maintains a project called cross: https://github.com/rust-embedded/cross
This might be another solution that you want to look into
The Rust distribution only provides compiled libraries for the host system. However, according to Arch Linux's wiki page on Rust, you could copy the compiled libraries from the Windows packages in the download directory (note that there are i686 and x86-64 packages) in the appropriate place on your system (in /usr/lib/rustlib or /usr/local/lib/rustlib, depending on where Rust is installed), install mingw-w64-gcc and Wine and you should be able to cross-compile.
If you're using Cargo, you can tell Cargo where to look for ar and the linker by adding this to ~/.cargo/config (where $ARCH is the architecture you use):
[target.$ARCH-pc-windows-gnu]
linker = "/usr/bin/$ARCH-w64-mingw32-gcc"
ar = "/usr/$ARCH-w64-mingw32/bin/ar"
Note: the exact paths can vary based on your distribution. Check the list of files for the mingw-w64 package(s) (GCC and binutils) in your distribution.
Then you can use Cargo like this:
$ # Build
$ cargo build --release --target "$ARCH-pc-windows-gnu"
$ # Run unit tests under wine
$ cargo test --target "$ARCH-pc-windows-gnu"
UPDATE 2019-06-11
This fails for me with:
Running `rustc --crate-name animation examples/animation.rs --color always --crate-type bin --emit=dep-info,link -C debuginfo=2 --cfg 'feature="default"' -C metadata=006e668c6384c29b -C extra-filename=-006e668c6384c29b --out-dir /home/roman/projects/rust-sdl2/target/x86_64-pc-windows-gnu/debug/examples --target x86_64-pc-windows-gnu -C ar=x86_64-w64-mingw32-gcc-ar -C linker=x86_64-w64-mingw32-gcc -C incremental=/home/roman/projects/rust-sdl2/target/x86_64-pc-windows-gnu/debug/incremental -L dependency=/home/roman/projects/rust-sdl2/target/x86_64-pc-windows-gnu/debug/deps -L dependency=/home/roman/projects/rust-sdl2/target/debug/deps --extern bitflags=/home/roman/projects/rust-sdl2/target/x86_64-pc-windows-gnu/debug/deps/libbitflags-2c7b3e3d10e1e0dd.rlib --extern lazy_static=/home/roman/projects/rust-sdl2/target/x86_64-pc-windows-gnu/debug/deps/liblazy_static-a80335916d5ac241.rlib --extern libc=/home/roman/projects/rust-sdl2/target/x86_64-pc-windows-gnu/debug/deps/liblibc-387157ce7a56c1ec.rlib --extern num=/home/roman/projects/rust-sdl2/target/x86_64-pc-windows-gnu/debug/deps/libnum-18ac2d75a7462b42.rlib --extern rand=/home/roman/projects/rust-sdl2/target/x86_64-pc-windows-gnu/debug/deps/librand-7cf254de4aeeab70.rlib --extern sdl2=/home/roman/projects/rust-sdl2/target/x86_64-pc-windows-gnu/debug/deps/libsdl2-3f37ebe30a087396.rlib --extern sdl2_sys=/home/roman/projects/rust-sdl2/target/x86_64-pc-windows-gnu/debug/deps/libsdl2_sys-3edefe52781ad7ef.rlib -L native=/home/roman/.cargo/registry/src/github.com-1ecc6299db9ec823/winapi-x86_64-pc-windows-gnu-0.4.0/lib`
error: linking with `x86_64-w64-mingw32-gcc` failed: exit code: 1
Maybe this will help https://github.com/rust-lang/rust/issues/44787
Static compile sdl2
There is option to static-compile sdl but it didn't work for me.
Also mixer is not included when used with bundled.
Let's cross-compile examples from rust-sdl2 project from Ubuntu to Windows x86_64
In ~/.cargo/config
[target.x86_64-pc-windows-gnu]
linker = "x86_64-w64-mingw32-gcc"
ar = "x86_64-w64-mingw32-gcc-ar"
Then run this:
sudo apt-get install gcc-mingw-w64-x86-64 -y
# use rustup to add target https://github.com/rust-lang/rustup.rs#cross-compilation
rustup target add x86_64-pc-windows-gnu
# Based on instructions from https://github.com/AngryLawyer/rust-sdl2/
# First we need sdl2 libs
# links to packages https://www.libsdl.org/download-2.0.php
sudo apt-get install libsdl2-dev -y
curl -s https://www.libsdl.org/release/SDL2-devel-2.0.9-mingw.tar.gz | tar xvz -C /tmp
# Prepare files for building
mkdir -p ~/projects
cd ~/projects
git clone https://github.com/Rust-SDL2/rust-sdl2
cd rust-sdl2
cp -r /tmp/SDL2-2.0.9/x86_64-w64-mingw32/lib/* ~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-pc-windows-gnu/lib/
cp /tmp/SDL2-2.0.9/x86_64-w64-mingw32/bin/SDL2.dll .
Build examples at once
cargo build --target=x86_64-pc-windows-gnu --verbose --examples
Or stop after first fail:
echo; for i in examples/*; do [ $? -eq 0 ] && cargo build --target=x86_64-pc-windows-gnu --verbose --example $(basename $i .rs); done
Run
cargo build will put binaries in target/x86_64-pc-windows-gnu/debug/examples/
Copy needed files:
cp /tmp/SDL2-2.0.4/x86_64-w64-mingw32/bin/SDL2.dll target/x86_64-pc-windows-gnu/debug/examples/
cp assets/sine.wav target/x86_64-pc-windows-gnu/debug/examples/
Then copy directory target/x86_64-pc-windows-gnu/debug/examples/ to your Windows machine and run exe files.
Run in cmd.exe
If you want to see the console output when running exe files, you may run them from cmd.exe.
To open cmd.exe in current directory in file explorer, right click with shift on empty place in window and choose Open command window here.
Backtraces with mingw should work now - if not use msvc https://github.com/rust-lang/rust/pull/39234
There is Docker based solution called cross. All the required tools are in virtualized environment so you don't need to install additional packages for your machine. See Supported targets list.
From project's README:
Features
cross will provide all the ingredients needed for cross compilation without touching your system installation.
cross provides an environment, cross toolchain and cross compiled libraries, that produces the most portable binaries.
“cross testing”, cross can test crates for architectures other than i686 and x86_64.
The stable, beta and nightly channels are supported.
Dependencies
rustup
A Linux kernel with binfmt_misc support is required for cross testing.
One of these container engines is required. If both are installed, cross will default to docker.
Docker. Note that on Linux non-sudo users need to be in the docker group. Read the official post-installation steps. Requires version 1.24 or later.
Podman. Requires version 1.6.3 or later.
Installation
$ cargo install cross
Usage
cross has the exact same CLI as Cargo but as it relies on Docker you'll have to start the daemon before you can use it.
# (ONCE PER BOOT)
# Start the Docker daemon, if it's not already running
$ sudo systemctl start docker
# MAGIC! This Just Works
$ cross build --target aarch64-unknown-linux-gnu
# EVEN MORE MAGICAL! This also Just Works
$ cross test --target mips64-unknown-linux-gnuabi64
# Obviously, this also Just Works
$ cross rustc --target powerpc-unknown-linux-gnu --release -- -C lto
The solution that worked for me was. It is similar to one of the accepted answers but I did not require to add the toolchain.
rustup target add x86_64-pc-windows-gnu
cargo build --target x86_64-pc-windows-gnu
Refer to the documentation for more details.
I've had success on Debian (testing) without using Mingw and Wine just following the official instructions. They look scary, but in the end it didn't hurt that much.
The official instructions also contain info on how to cross-compile C/C++ code. I haven't needed that, so it's something I haven't actually tested.
A couple of remarks for individual points in the official instructions. The numbers match the numbers in the official instructions.
Debian: sudo apt-get install lld
Make a symlink named lld-link to lld somewhere in your $PATH. Example: ln -s /usr/bin/lld local_bin/lld-link
I don't cross-compile C/C++, haven't used this point personally.
This is probably the most annoying part. I installed Rust on a Windows box via rustup, and copied the libraries from the directories named in the official docs to the Linux box. Beware, there were sometimes uppercase library filenames, but lld wants them all lowercase (Windows isn't case-sensitive, Linux is). I've used the following to rename all files in current directory to lowercase:
for f in `find`; do mv -v "$f" "`echo $f | tr '[A-Z]' '[a-z]'`"; done
Personally, I've needed both Kit directories and just one of the VC dirs.
I don't cross-compile C/C++, haven't used this point personally.
Just make $LIB_ROOT in the script at the end of this post point to the lib directory from point 3.
Mandatory
I don't cross-compile C/C++, haven't used this point personally.
Depending the target architecture, either of the following:
rustup target add i686-pc-windows-msvc
rustup target add x86_64-pc-windows-msvc
For cross-building itself, I'm using the following simple script (32-bit version):
#!/bin/sh
# "cargo build" for the 32-bit Windows MSVC architecture.
# Set this to proper directory
LIB_ROOT=~/opt/rust-msvc
# The rest shouldn't need modifications
VS_LIBS="$LIB_ROOT/Microsoft Visual Studio 14.0/VC/lib/"
KIT_8_1_LIBS="$LIB_ROOT/Windows Kits/8.1/Lib/winv6.3/um/x86/"
KIT_10_LIBS="$LIB_ROOT/Windows Kits/10/Lib/10.0.10240.0/ucrt/x86/"
export LIB="$VS_LIBS;$KIT_8_1_LIBS;$KIT_10_LIBS"
cargo build --target=i686-pc-windows-msvc "$#"
I'm using the script the same way I would use cargo build
Hope that helps somebody!

Compile r with mkl (With mulithreads support)

I compiled R by regarding these guides:
http://www.r-bloggers.com/compiling-64-bit-r-2-10-1-with-mkl-in-linux/
http://cran.r-project.org/doc/manuals/R-admin.html#MKL
But for matrix algebra R does not use all available CPUs.
I tried both:
MKL="-L${MKL_LIB_PATH} -lmkl_gf_lp64 -lmkl_gnu_thread \
-lmkl_core -fopenmp -lpthread"
and
MKL=" -L${MKL_LIB_PATH} \
-Wl,--start-group \
${MKL_LIB_PATH}/libmkl_gf_lp64.a \
${MKL_LIB_PATH}/libmkl_gnu_thread.a \
${MKL_LIB_PATH}/libmkl_core.a \
-Wl,--end-group \
-lgomp -lpthread"
Options.
How can I force R to use all available CPUs?
How can I check whether R use MKL or not?
I would like to add my procedure to compile R 3.0.1 with MKL libraries. I am using Debian 7.0 on a core i7 intel processor, 8G RAM. First i installed the MKL libraries, after i set MKL related environment variables (MKLROOT and LD_LIBRARY_PATH) with this command:
>source /opt/intel/mkl/bin/mklvars.sh intel64
So i used the following parameters to ./configure:
>./configure --enable-R-shlib --enable-threads=posix --with-lapack --with-blas="-fopenmp -m64 -I$MKLROOT/include -L$MKLROOT/lib/intel64 -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lpthread -lm"
and finished the installation with make and make install.
As a benchmark, i did a product between two 5000 x 5000 matrix product without MKL and got:
user system elapsed
57.455 0.104 29.033
and after compiling:
user system elapsed
15.993 0.176 4.333
a real gain!
All this is now a lot easier -- a short blog post is here discussing the steps below in detail.
But in short, all you need is this:
## get archive key
cd /tmp
wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
## add MKL to apt's repo list
sh -c 'echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list'
## update and install (500+ mb download, 1.9gb installed)
apt-get update
apt-get install intel-mkl-64bit-2018.2-046
## make it system default via update alternatives
update-alternatives --install /usr/lib/x86_64-linux-gnu/libblas.so libblas.so-x86_64-linux-gnu /opt/intel/mkl/lib/intel64/libmkl_rt.so 50
update-alternatives --install /usr/lib/x86_64-linux-gnu/libblas.so.3 libblas.so.3-x86_64-linux-gnu /opt/intel/mkl/lib/intel64/libmkl_rt.so 50
update-alternatives --install /usr/lib/x86_64-linux-gnu/liblapack.so liblapack.so-x86_64-linux-gnu /opt/intel/mkl/lib/intel64/libmkl_rt.so 50
update-alternatives --install /usr/lib/x86_64-linux-gnu/liblapack.so.3 liblapack.so.3-x86_64-linux-gnu /opt/intel/mkl/lib/intel64/libmkl_rt.so 50
## tell ldconfig
echo "/opt/intel/lib/intel64" > /etc/ld.so.conf.d/mkl.conf
echo "/opt/intel/mkl/lib/intel64" >> /etc/ld.so.conf.d/mkl.conf
ldconfig
That's it. Nothing else. Not recompiling or linking. And for example R now shows in sessionInfo() :
Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so
(not a real answer: I don't use MKL, I use OpenBlas as shared BLAS as described in the R-admin manual.)
As quick check whether the optimized BLAS is used I do a matrix multiplication. Even if only 1 core is used, this should be faster for the optimized BLAS than for the standard BLAS R comes with.
To check how many cores are in use, I look at top (or a CPU usage graph/monitor) during the matrix multiplication.
There has been trouble in the past with CPU affinity, so that a BLAS would start $n$ threads, but they were all running on the same core, see Parallel processing in R limited.
r-devel (3.0.0-to-be) has a function to set the CPU affinity.
A complete tutorial is available here:
https://software.intel.com/en-us/articles/build-r-301-with-intel-c-compiler-and-intel-mkl-on-linux
or simply using:
http://mran.revolutionanalytics.com/download/

Resources