How to check if dlib is using GPU or not? - python-3.x

My machine has Geforce 940mx GDDR5 GPU.
I have installed all requirements to run GPU accelerated dlib (with GPU support):
CUDA 9.0 toolkit with all 3 patches updates from https://developer.nvidia.com/cuda-90-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exelocal
cuDNN 7.1.4
Then I executed all those below command after cloning dlib/davisKing repository on Github for compliling dlib with GPU support
:
$ git clone https://github.com/davisking/dlib.git
$ cd dlib
$ mkdir build
$ cd build
$ cmake .. -DDLIB_USE_CUDA=1 -DUSE_AVX_INSTRUCTIONS=1
$ cmake --build .
$ cd ..
$ python setup.py install --yes USE_AVX_INSTRUCTIONS --yes DLIB_USE_CUDA
Now how could I possibly check/confirm if dlib(or other libraries depend on dlib like face_recognition of Adam Geitgey) is using GPU inside python shell/Anaconda(jupyter Notebook)?

In addition to the previous answer using command,
dlib.DLIB_USE_CUDA
There are some alternative ways to make sure if dlib is actually using your GPU.
Easiest way to check it is to check if dlib recognizes your GPU.
import dlib.cuda as cuda
print(cuda.get_num_devices())
If the number of devices is >= 1 then dlib can use your device.
Another useful trick is to run your dlib code and at the same time run
$ nvidia-smi
This should give you full GPU utilization information where you can se ethe total utilization together with memory usage of each process separately.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48 Driver Version: 410.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 00000000:01:00.0 On | N/A |
| 0% 52C P2 36W / 151W | 763MiB / 8117MiB | 5% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1042 G /usr/lib/xorg/Xorg 18MiB |
| 0 1073 G /usr/bin/gnome-shell 51MiB |
| 0 1428 G /usr/lib/xorg/Xorg 167MiB |
| 0 1558 G /usr/bin/gnome-shell 102MiB |
| 0 2113 G ...-token=24AA922604256065B682BE6D9A74C3E1 33MiB |
| 0 3878 C python 385MiB |
+-----------------------------------------------------------------------------+
In some cases the Processes box might say something like "processes are not supported", this does not mean your GPU cannot run code but it does not just support this kind of logging.

If dlib.DLIB_USE_CUDA is true then it's using cuda, if it's false then it isn't.
As an aside, these steps do nothing and are not needed to use python:
$ mkdir build
$ cd build
$ cmake .. -DDLIB_USE_CUDA=1 -DUSE_AVX_INSTRUCTIONS=1
$ cmake --build .
Just running setup.py is all you need to do.

The following snippets have been simplified to either use or check whether dlib is using GPU or not.
First, Check whether dlib identifies your GPU or not.
import dlib.cuda as cuda;
print(cuda.get_num_devices());
Secondly, dlib.DLIB_USE_CUDA if it's false, simply allow it to use GPU support by
dlib.DLIB_USE_CUDA = True.

Related

CUDA 11.3 not being detected by PyTorch [Anaconda]

I am running Ubuntu 20.04 on GTX 1050TI. I have installed CUDA 11.3.
nvidia-smi output:
Wed Apr 6 18:27:23 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| N/A 44C P8 N/A / N/A | 11MiB / 4040MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3060 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 4270 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
nvcc --version output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
Anaconda PyTorch isn't detecting CUDA:
> import torch
> torch.cuda.is_available()
> False
Any ideas how to solve the issue?
The solution:
Conda in my case installed cpu build. You can easily identify your build type by running torch.version.cuda which should return a string in case you have the CUDA build. if you get None then you are running the cpu build and it will not detect CUDA
To fix that I installed torch using pip instead :
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

WSL2 Pytorch - RuntimeError: No CUDA GPUs are available with RTX3080

I have been struggling for day to make torch work on WSL2 using an RTX 3080.
I Installed the CUDA-toolkit version 11.3
Running nvcc -V returns this :
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
nvidia-smi returns this
Mon Nov 29 00:38:26 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.00 Driver Version: 510.06 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| N/A 52C P5 17W / N/A | 1082MiB / 16384MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I verified the installation of the toolkit with blackscholes
./BlackScholes
[./BlackScholes] - Starting...
GPU Device 0: "Ampere" with compute capability 8.6
Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
...generating input data in CPU mem.
...copying input data to GPU mem.
Data init done.
Executing Black-Scholes GPU kernel (512 iterations)...
Options count : 8000000
BlackScholesGPU() time : 0.242822 msec
Effective memory bandwidth: 329.459087 GB/s
Gigaoptions per second : 32.945909
BlackScholes, Throughput = 32.9459 GOptions/s, Time = 0.00024 s, Size = 8000000 options, NumDevsUsed = 1, Workgroup = 128
Reading back GPU results...
Checking the results...
...running CPU calculations.
Comparing the results...
L1 norm: 1.741792E-07
Max absolute error: 1.192093E-05
Shutting down...
...releasing GPU memory.
...releasing CPU memory.
Shutdown done.
[BlackScholes] - Test Summary
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Test passed
And when I try to use torch, it doesn't find any GPU. Btw, I had to install torch==1.10.0+cu113 if I wanted to use torch with my RTX 3080 as the sm_ with the simple 1.10.0 version are not compatible with the rtx3080.
Running torch returns this :
>>> import torch
>>> torch.version
<module 'torch.version' from '/home/snihar/miniconda3/envs/tscbasset/lib/python3.7/site-packages/torch/version.py'>
>>> torch.version.cuda
'11.3'
>>> torch.cuda.get_arch_list()
[]
>>> torch.cuda.device_count()
0
>>> torch.cuda.current_device()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/snihar/miniconda3/envs/tscbasset/lib/python3.7/site-packages/torch/cuda/__init__.py", line 479, in current_device
_lazy_init()
File "/home/snihar/miniconda3/envs/tscbasset/lib/python3.7/site-packages/torch/cuda/__init__.py", line 214, in _lazy_init
torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
At last, interestingly, I am completely able to run tensorflow-gpu on the same machine.
Installed pytorch like this : conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
Also, I managed to run pytorch in a docker container started from my WSL2 machine with this command :
sudo docker run --gpus all -it --rm -v /home/...:/home/... nvcr.io/nvidia/pytorch:21.11-py3.
When running pytorch on the windows machine I am running the WSL from, it works too. Both return ['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_37'] which says that the library is compatible with rtx 3080.
In my case, I solved this issue by linking /usr/lib/wsl/lib/libcuda.so.1 to the libcuda.so in your wsl2 CUDA location. See https://github.com/microsoft/WSL/issues/5663
After reboot, pytorch can find the GPU.
(I found the warning " /usr/lib/wsl/lib/libcuda.so.1 is not a symbolic link" during the apt-get upgrade command. Not sure you can solve it in the same way) Downgrade to pytorch 1.8.2LTS can also solve the problem, but the calculation speed is extremely low.
I've met the same one, solved by downgrade pytorch from 1.10 to 1.8.2LTS
I got the same warning as #Homer Simpson when I ran the command sudo ldconfig.
Dealt with it the same way that #Homer Simpson posted. In essence, what you need to do is delete libcuda.so and libcuda.so.1 and recreate them again but this time, making symbolic links to libcuda.so.1.1
# Run CMD in Windows (as Administrator)
C:
cd \Windows\System32\lxss\lib
del libcuda.so
del libcuda.so.1
mklink libcuda.so libcuda.so.1.1
mklink libcuda.so.1 libcuda.so.1.1
# Open WSL bash
wsl -e /bin/bash
sudo ldconfig
Ref: https://github.com/microsoft/WSL/issues/5548#issuecomment-912495487
Short: install PyTorch with cuda 11.1 or lower
Long: Unfortunately I cannot explain why this is happening but after experimenting with different distro versions (ubuntu and debian) and PyTorch versions (pip and conda), it seems that cuda 11.3 which is the only 11.x cuda shipped with pytorch on conda, does not work (cuda 10.2 works just fine).
Solution: have to install it using pip given the version you desire from official previous pytorch version page.
At the time of writing, highest PyTorch version with highest cuda on WSL2 can be installed using following command:
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 -f https://download.pytorch.org/whl/torch_stable.html
Note that you need to use cmd.exe not powershell because mklink is part of cmd.exe not an actual program

tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error

I am trying to use GPU with Tensorflow. My Tensorflow version is 2.4.1 and I am using Cuda version 11.2. Here is the output of nvidia-smi.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39 Driver Version: 460.39 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce MX110 Off | 00000000:01:00.0 Off | N/A |
| N/A 52C P0 N/A / N/A | 254MiB / 2004MiB | 8% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1151 G /usr/lib/xorg/Xorg 37MiB |
| 0 N/A N/A 1654 G /usr/lib/xorg/Xorg 136MiB |
| 0 N/A N/A 1830 G /usr/bin/gnome-shell 68MiB |
| 0 N/A N/A 5443 G /usr/lib/firefox/firefox 0MiB |
| 0 N/A N/A 5659 G /usr/lib/firefox/firefox 0MiB |
+-----------------------------------------------------------------------------+
I am facing a strange issue. Previously when I was trying to list all the physical devices using tf.config.list_physical_devices() it was identifying one cpu and one gpu. AFter that I tried to do a simple matrix multiplication on the GPU. It failed with this error : failed to synchronize cuda stream CUDA_LAUNCH_ERROR (the error code was something like that, I forgot to note it). But after that when I again tried the same thing from another terminal, it failed to recognise any GPU. This time, listing physical devices produce this:
>>> tf.config.list_physical_devices()
2021-04-11 18:56:47.504776: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-11 18:56:47.507646: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-04-11 18:56:47.534189: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2021-04-11 18:56:47.534233: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: debadri-HP-Laptop-15g-dr0xxx
2021-04-11 18:56:47.534244: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: debadri-HP-Laptop-15g-dr0xxx
2021-04-11 18:56:47.534356: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.39.0
2021-04-11 18:56:47.534393: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.39.0
2021-04-11 18:56:47.534404: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460.39.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
My OS is Ubuntu 20.04, Python version 3.8.5 and Tensorflow , as mentioned before 2.4.1 with Cuda version 11.2. I installed cuda from these instructions. One additional piece of information; when I import tensorflow , it shows the following output:
import tensorflow as tf
2021-04-11 18:56:07.716683: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
What am I missing? Why is it failing to recognise the GPU even though it was recognising previously?
tldr: Disable Secure Boot before installing the Nvidia Driver.
I had the exact same error, and I spent a ton of time trying to figure out if I had installed Tensorflow related stuff incorrectly. After many hours of problem solving, I found that my NVIDIA driver was having some problems because I never disabled secure boot in my BIOS when setting up Ubuntu 20.4. Here's what I suggest (I opted for using Docker w/ Tensorflow, which avoids having to install all theCuda related stuff) - I hope it works for you!
Disable Secure Boot in your BIOS
Make a fresh install on Ubuntu 20.4
Install Docker according to nvidia-container-toolkit's page.
curl https://get.docker.com | sh \
&& sudo systemctl --now enable docker
Install nvidia-container-toolkit from the same page.
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
Test to make sure that's working with
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
Finally, use Tensorflow with Docker w/ GPU support!
docker run --gpus all -u $(id -u):$(id -g) -it -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter jupyter notebook --ip=0.0.0.0
I just made an account to say that #Nate's answer worked for me.
I have the exact same setting as you and have been trying for two days.
What I did in the end was
Reboot - F10 to the setting - Security - BIOS Secure Boot (or something like that I don't remember exactly) - Disabled
Then there was some extra steps with the confirmation but it worked fine. I did not re-install the whole Unbuntu. It was a bit too technically risky for me.
Then I tried the tf.config line and I got this:
2021-06-14 17:12:19.546509: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2021-06-14 17:12:26.754680: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-06-14 17:12:26.909679: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3593460000 Hz
2021-06-14 17:12:26.910016: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55a8352501c0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-06-14 17:12:26.910040: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-06-14 17:12:26.972350: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-06-14 17:12:27.074861: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-14 17:12:27.075289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:0c:00.0 name: GeForce GTX 1650 computeCapability: 7.5
coreClock: 1.665GHz coreCount: 14 deviceMemorySize: 3.81GiB deviceMemoryBandwidth: 119.24GiB/s
There are more red lines on devices properties towards the end but I got
Default GPU Device: /device:GPU:0
Don't know why it works, but it works. Just change the security boot setting.
I don't have enough experience points to upvote Nate's answer. I will come back later. But he/she really offers a good solution.
Disabling Secure Boot solved the problem immediately. No need to reinstall anything.
> import tensorflow as tf
> tf.config.list_physical_devices("GPU")
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

GCP AI Platform Notebook driver too old?

I am trying to run the following Hugging Face Transformers tutorial on GCP's AI Platform Notebook with 32 vCPUs, 208 GB RAM, and 2 NVIDIA Tesla T4s.
However, when I try to run the part
model = DistillBERTClass()
model.to(device)
I get the following Assertion Error:
AssertionError: The NVIDIA driver on your system is too old (found version 10010).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.
However, when I run
!nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 38C P0 22W / 70W | 10MiB / 15079MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:00:05.0 Off | 0 |
| N/A 39C P8 10W / 70W | 10MiB / 15079MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
The version on the NVIDIA driver is compatible with the latest PyTorch version, which I am using.
Has anyone else ran into this error, and is there a way around it?
You can either:
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install a PyTorch
version that has been compiled with your version of the CUDA driver.
You can try a newer NVIDIA driver version, we support latest CUDA 11 driver version, and then install Pytorch on top of it:
gcloud beta notebooks instances create cuda11 \
--vm-image-project=deeplearning-platform-release \
--vm-image-family=common-cu110-notebooks-debian-9 \
--machine-type=n1-standard-1 \
--location=us-west1-a \
--format=json
Image family:
common-cu110-notebooks-debian-9
common-cu110-notebooks-debian-10

Theano Not Able To Find Gpu - Ubuntu 16.04

WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available (error: cuda unavailable)
I get this error when trying to run any sample Theano program.
I have tried all the suggested fixes provided in this thread.
nvcc --version output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
nvidia-smi output:
Sat Dec 10 00:46:14 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57 Driver Version: 367.57 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 0000:01:00.0 Off | N/A |
| 0% 37C P0 33W / 151W | 0MiB / 8112MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
gcc version:
(venv) rgalbo#blueberry:~$ gcc --version
gcc (Ubuntu 4.9.3-13ubuntu2) 4.9.3
I have been trying to get this to work for a while now, would like someone to point me in the right direction.
So I was finally able to get Theano to find the gpu, I went through the steps provided here in order to clean up any corrupt installation that may have occured from my initial installation of CUDA.
After this I ran sudo apt-get install cuda which installed the right driver packages for my nvidia graphics card. I then proceeded to install CUDA 8.0 from the deb and this was able to over-write the 7.5 version that was giving me issues.
This is the output I am now able to get from theano_test.py:
(venv) rgalbo#blueberry:~$ python theano_test.py
Using gpu device 0: GeForce GTX 1070 (CNMeM is disabled, cuDNN 5103)
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.185949 seconds
Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761
1.62323296]
Used the gpu
and here is my ~/.theanorc file:
(venv) rgalbo#blueberry:~$ cat ~/.theanorc
[global]
floatX = float32
device = gpu
[nvcc]
flags=-D_FORCE_INLINE
[cuda]
root = /usr/local/cuda-8.0
After each separate install I updated and rebooted the server just for good luch, which I found to be helpful.

Resources