TensorFlow: Why does cuDNN fails to launch ? CUDA_ERROR_LAUNCH_FAILED

TensorFlow: Why does cuDNN fails to launch ? CUDA_ERROR_LAUNCH_FAILED - python-3.x

Background:
I have been using TensorFlow + Keras for deep learning since 3-4 months on my laptop. I usually use python scripts and invoke them from shell. Today I decided to try using jupyter notebook for trying out a deep neural network from my script. I was starting an epoch using Keras, but then I was met with the following error
Error:
Using TensorFlow backend.
WARNING:tensorflow:From D:\Anaconda3\envs\tensorflow-gpu-resonator\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From D:\Anaconda3\envs\tensorflow-gpu-resonator\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/5
2019-03-16 15:19:11.414481: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-03-16 15:19:11.971669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.189
pciBusID: 0000:01:00.0
totalMemory: 4.00GiB freeMemory: 3.35GiB
2019-03-16 15:19:11.981427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-03-16 15:19:26.748014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-16 15:19:26.752914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-03-16 15:19:26.757983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-03-16 15:19:26.792177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3058 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
2019-03-16 15:19:29.466453: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library cublas64_100.dll locally
2019-03-16 15:19:54.230708: E tensorflow/stream_executor/cuda/cuda_driver.cc:981] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-16 15:19:54.236093: E tensorflow/stream_executor/cuda/cuda_timer.cc:55] Internal: error destroying CUDA event in context 0000016C5B168AD0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-16 15:19:54.243638: E tensorflow/stream_executor/cuda/cuda_timer.cc:60] Internal: error destroying CUDA event in context 0000016C5B168AD0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-16 15:19:54.250401: F tensorflow/stream_executor/cuda/cuda_dnn.cc:194] Check failed: status == CUDNN_STATUS_SUCCESS (7 vs. 0)Failed to set cuDNN stream.
Solutions Tried:
As suggested by multiple similar issues on GitHub, I tried updating my CUDA version and TensorFlow version and my graphics drivers
Currently I am using:
CUDA Version : 10.0.130
cuDNN Version : 7.3.1
TensorFlow GPU : 1.13.1
Keras GPU : 2.2.4
OS : Windows 10 64 bit Version 1809
GPU : NVIDIA GeForce 940MX
GPU Driver Version : 419.35 released 03-05-2019
I thought this must be because TensorFlow is simply not able to connect to GPU properly. So I tried TensorFlow GPU test from here and it seems to be working.(Check out image here since I don't have enough reputation)
Judging from #1, #2, #3 this seems to be an error due to unauthorized memory access.
How should I solve this problem ? I thought since this was related to memory management, a reboot could solve this. But I still get this issue.
[P.S. : All the software libraries like cuDNN and CUDA are installed through Anaconda3 with Python 3.6.6]

Related

Tensorflow GPU crashing kernel/shell

When every I try to run any training model the shell or kernel restarts. Spyder 5.3.2 Kernel Screenshot
This is the kernel output:
Segmentation Models: using keras framework.
2022-08-31 12:48:32.675519: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-31 12:48:33.019498: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3491 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
Epoch 1/10
2022-08-31 12:48:37.509234: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8500
Restarting kernel...
This started happening after the GPU setup for tensorflow (2.9.1). Python(3.9)
I have tried rebuilding it using blazel to enable GPU according to tensorflow.org website.
I have installed supported CUDA and cuDNN files.
The GPU is detected here
GPU Screenshot:

Pytorch says that CUDA is not available (on Ubuntu)

I'm trying to run Pytorch on a laptop that I have. It's an older model but it does have an Nvidia graphics card. I realize it is probably not going to be sufficient for real machine learning but I am trying to do it so I can learn the process of getting CUDA installed.
I have followed the steps on the installation guide for Ubuntu 18.04 (my specific distribution is Xubuntu).
My graphics card is a GeForce 845M, verified by lspci | grep nvidia:
01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce 845M] (rev a2)
01:00.1 Audio device: NVIDIA Corporation Device 0fbc (rev a1)
I also have gcc 7.5 installed, verified by gcc --version
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
And I have the correct headers installed, verified by trying to install them with sudo apt-get install linux-headers-$(uname -r):
Reading package lists... Done
Building dependency tree
Reading state information... Done
linux-headers-4.15.0-106-generic is already the newest version (4.15.0-106.107).
I then followed the installation instructions using a local .deb for version 10.1.
Now, when I run nvidia-smi, I get:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 845M On | 00000000:01:00.0 Off | N/A |
| N/A 40C P0 N/A / N/A | 88MiB / 2004MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 982 G /usr/lib/xorg/Xorg 87MiB |
+-----------------------------------------------------------------------------+
and I run nvcc -V I get:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
I then performed the post-installation instructions from section 6.1, and so as a result, echo $PATH looks like this:
/home/isaek/anaconda3/envs/stylegan2_pytorch/bin:/home/isaek/anaconda3/bin:/home/isaek/anaconda3/condabin:/usr/local/cuda-10.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
echo $LD_LIBRARY_PATH looks like this:
/usr/local/cuda-10.1/lib64
and my /etc/udev/rules.d/40-vm-hotadd.rules file looks like this:
# On Hyper-V and Xen Virtual Machines we want to add memory and cpus as soon as they appear
ATTR{[dmi/id]sys_vendor}=="Microsoft Corporation", ATTR{[dmi/id]product_name}=="Virtual Machine", GOTO="vm_hotadd_apply"
ATTR{[dmi/id]sys_vendor}=="Xen", GOTO="vm_hotadd_apply"
GOTO="vm_hotadd_end"
LABEL="vm_hotadd_apply"
# Memory hotadd request
# CPU hotadd request
SUBSYSTEM=="cpu", ACTION=="add", DEVPATH=="/devices/system/cpu/cpu[0-9]*", TEST=="online", ATTR{online}="1"
LABEL="vm_hotadd_end"
After all of this, I even compiled and ran the samples. ./deviceQuery returns:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce 845M"
CUDA Driver Version / Runtime Version 10.1 / 10.1
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 2004 MBytes (2101870592 bytes)
( 4) Multiprocessors, (128) CUDA Cores/MP: 512 CUDA Cores
GPU Max Clock rate: 863 MHz (0.86 GHz)
Memory Clock rate: 1001 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 1048576 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1
Result = PASS
and ./bandwidthTest returns:
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: GeForce 845M
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 11.7
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 11.8
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 14.5
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
But after all of this, this Python snippet (in a conda environment with all dependencies installed):
import torch
torch.cuda.is_available()
returns False
Does anybody have any idea about how to resolve this? I've tried to add /usr/local/cuda-10.1/bin to etc/environment like this:
PATH=$PATH:/usr/local/cuda-10.1/bin
And restarting the terminal, but that didn't fix it. I really don't know what else to try.
EDIT - Results of collect_env for #kHarshit
Collecting environment information...
PyTorch version: 1.5.0
Is debug build: No
CUDA used to build PyTorch: 10.2
OS: Ubuntu 18.04.4 LTS
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
CMake version: Could not collect
Python version: 3.6
Is CUDA available: No
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce 845M
Nvidia driver version: 418.87.00
cuDNN version: Could not collect
Versions of relevant libraries:
[pip] numpy==1.18.5
[pip] pytorch-ranger==0.1.1
[pip] stylegan2-pytorch==0.12.0
[pip] torch==1.5.0
[pip] torch-optimizer==0.0.1a12
[pip] torchvision==0.6.0
[pip] vector-quantize-pytorch==0.0.2
[conda] numpy 1.18.5 pypi_0 pypi
[conda] pytorch-ranger 0.1.1 pypi_0 pypi
[conda] stylegan2-pytorch 0.12.0 pypi_0 pypi
[conda] torch 1.5.0 pypi_0 pypi
[conda] torch-optimizer 0.0.1a12 pypi_0 pypi
[conda] torchvision 0.6.0 pypi_0 pypi
[conda] vector-quantize-pytorch 0.0.2 pypi_0 pypi

PyTorch doesn't use the system's CUDA library. When you install PyTorch using the precompiled binaries using either pip or conda it is shipped with a copy of the specified version of the CUDA library which is installed locally. In fact, you don't even need to install CUDA on your system to use PyTorch with CUDA support.
There are two scenarios which could have caused your issue.
You installed the CPU only version of PyTorch. In this case PyTorch wasn't compiled with CUDA support so it didn't support CUDA.
You installed the CUDA 10.2 version of PyTorch. In this case the problem is that your graphics card currently uses the 418.87 drivers, which only support up to CUDA 10.1. The two potential fixes in this case would be to either install updated drivers (version >= 440.33 according to Table 2) or to install a version of PyTorch compiled against CUDA 10.1.
To determine the appropriate command to use when installing PyTorch you can use the handy widget in the "Install PyTorch" section at pytorch.org. Just select the appropriate operating system, package manager, and CUDA version then run the recommended command.
In your case one solution was to use
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
which explicitly specifies to conda that you want to install the version of PyTorch compiled against CUDA 10.1.
For more information about PyTorch CUDA compatibility with respect drivers and hardware see this answer.
Edit After you added the output of collect_env we can see that the problem was that you had the CUDA 10.2 version of PyTorch installed. Based on that an alternative solution would have been to update the graphics driver as elaborated in item 2 and the linked answer.

TL; DR
Install NVIDIA Toolkit provided by Canonical or NVIDIA third-party PPA.
Reboot your workstation.
Create a clean Python virtual environment (or reinstall all CUDA dependent packages).
Description
First install NVIDIA CUDA Toolkit provided by Canonical:
sudo apt install -y nvidia-cuda-toolkit
or follow NVIDIA developers instructions:
# ENVARS ADDED **ONLY FOR READABILITY**
NVIDIA_CUDA_PPA=https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/
NVIDIA_CUDA_PREFERENCES=https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
NVIDIA_CUDA_PUBKEY=https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
# Add NVIDIA Developers 3rd-Party PPA
sudo wget ${NVIDIA_CUDA_PREFERENCES} -O /etc/apt/preferences.d/nvidia-cuda
sudo apt-key adv --fetch-keys ${NVIDIA_CUDA_PUBKEY}
echo "deb ${NVIDIA_CUDA_PPA} /" | sudo tee /etc/apt/sources.list.d/nvidia-cuda.list
# Install development tools
sudo apt update
sudo apt install -y cuda
then reboot the OS load the kernel with the NVIDIA drivers
Create an environment using your favorite manager (conda, venv, etc)
conda create -n stack-overflow pytorch torchvision
conda activate stack-overflow
or reinstall pytorch and torchvision into the existing one:
conda activate stack-overflow
conda install --force-reinstall pytorch torchvision
otherwise NVIDIA CUDA C/C++ bindings may not be correctly detected.
Finally ensure CUDA is correctly detected:
(stack-overflow)$ python3 -c 'import torch; print(torch.cuda.is_available())'
True
Versions
NVIDIA CUDA Toolkit v11.6
Ubuntu LTS 20.04.x
Ubuntu LTS 22.04 (prior official release)

In my case, just restarting my machine made the GPU active again. The initial message I got was that the GPU is currently in use by another application. But when I looked at nvidia-smi, there was nothing that I saw. So, no changes to dependencies, and it just started working again.

Another possible scenario is that environment variable CUDA_VISIBLE_DEVICES is not set correctly before installing PyTorch.

In my case it worked to do as follows:
remove the CUDA drivers
sudo apt-get remove --purge nvidia*
Then get the exact installation script of the drivers based on your distro and system from the link: https://developer.nvidia.com/cuda-downloads?target_os=Linux
In my case it was dabian on x64 so I did:
wget https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo add-apt-repository contrib
sudo apt-get update
sudo apt-get -y install cuda
And now nvidia-smi works as intended!
I hope that helps

If your CUDA version does not match what PyTorch expects, you will see this issue.
On Arch / Manjaro:
Get Pytorch from here: https://pytorch.org/get-started/locally/
Note what CUDA version you are getting PyTorch for
Get the same CUDA version from here: https://archive.archlinux.org/packages/c/cuda/
Install CUDA using (e.g.) sudo pacman -U --noconfirm cuda-11.6.2-1-x86_64.pkg.tar.zst
Do not update to a newer version of CUDA than PyTorch expects. If PyTorch wants 11.6 and you have updated to 11.7, you will get the error message.

Make sure that os.environ['CUDA_VISIBLE_DEVICES'] = '0' is set after if __name__ == "__main__":. So your code should look like this:
import torch
import os
if __name__ == "__main__":
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
print(torch.cuda.is_available()) // true
...

Pytorch list of supported GPU hardware for each release (Ubuntu18.04)

I have an old GPU and pytorch says it is too old to support:
Found GPU0 GeForce GTX 670 which is of cuda capability 3.0.
PyTorch no longer supports this GPU because it is too old.
The minimum cuda capability that we support is 3.5.
First question is what is this 3.0 / 3.5 referring to ? Clearly not cuda, nvidia driver nor pytorch?
Secondly, I know I can build Pytorch from source in order to have more support, however it's unclear what hardware pytorch supports when built from source, or even when installed with pip. I'm at the point where the likely pytorch version I need, requires python2, and cuda 9, so I'm spending solid time changing a bunch of software versions without knowing whether any of it will work.
Also if I were to get a newer GPU, I don't know whether it is currently supported.
Any way to have a list of supported NVIDIA hardware of pytorch? Ideally for each releases, but at least for current release?
I run an Nvidia GTX670 with Nvidia Driver 430.50. I changed from cuda 10 to 9, but seeing I might have to build pytorch 0.3.1, I think cuda8 is needed since when runnin python setup.py install in v0.3.1 i get:
CMakeFiles/THC.dir/build.make:560: recipe for target 'CMakeFiles/THC.dir/THC_generated_THCHalf.cu.o' failed
make[2]: *** [CMakeFiles/THC.dir/THC_generated_THCHalf.cu.o] Error 1
CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/THC.dir/all' failed
make[1]: *** [CMakeFiles/THC.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2

Executing keras aborts RStudio session

I am trying to install keras in a laptop with R and RStudio previously installed.
I first installed Anaconda3 following the instructions here https://docs.anaconda.com/anaconda/install/linux/.
Second, I followed these instructions to install keras and tensorflow in RStudio https://github.com/FrancisArgnR/Guide-Keras-R (cpu instructions), but specifying conda method:
install.packages('devtools')
devtools::install_github("rstudio/keras")
library(keras)
install_keras(method = c("conda"))
However, whenever I tried to run keras functions (data <- dataset_mnist()), RStudio session aborts. When I use R in the terminal I get the error:
> library(keras)
> data<-dataset_mnist()
*** caught illegal operation ***
address 0x7fb3e50fe820, cause 'illegal operand'
Traceback:
1: py_module_import(module, convert = convert)
2: import(module)
3: doTryCatch(return(expr), name, parentenv, handler)
4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
5: tryCatchList(expr, classes, parentenv, handlers)
6: tryCatch(import(module), error = clear_error_handler())
7: py_resolve_module_proxy(x)
8: `$.python.builtin.module`(keras, "datasets")
9: keras$datasets
10: dataset_mnist()
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
I found a similar error previously reported here https://github.com/rstudio/tensorflow/issues/228, although the "Traceback" is not exactly the same. I tried specifying an earlier version of Tensorflow as suggested, but the error perstists.
install_keras(method = c("conda"),tensorflow = "1.5")
I tired serveral tensorflow versions, from 1.9 to 1.0, but I obtain the same behaviour.
I also tried installing keras from CRAN with install.packages("keras") and repeating all the procedure, but nothing changed. If I do not specify the conda method, the same happens, but in addition, I obtain the following errors when running install_keras():
ERROR: spyder 3.3.6 requires pyqt5<5.13; python_version >= "3", which is not installed.
ERROR: spyder 3.3.6 requires pyqtwebengine<5.13; python_version >= "3", which is not installed.
ERROR: astroid 2.3.1 requires typed-ast<1.5,>=1.4.0; implementation_name == "cpython" and python_version < "3.8", which is not installed.
ERROR: astroid 2.3.1 has requirement six==1.12, but you'll have six 1.13.0 which is incompatible.
Some details in case they can be useful:
R version 3.6.1, Platform: x86_64-pc-linux-gnu (64-bit)
RStudio: Version 1.2.5019
OS: Ubuntu 19.10
Processor: Intel® Celeron(R) CPU N3450 # 1.10GHz × 4
5,6 GiB RAM

It seems that your problem is caused by the lack of AVX instructions support of your CPU.
First you should know that when you run install_keras(method = "conda") a new environment (usually named r-tensorflow) is created. All the Python libraries required to run Tensorflow within R will be installed in this environment. When you run Keras code in R what R does is call the Tensorflow library of this environment.
Most of the pre-built binaries of Tensorflow are compiled on a CPU supporting the AVX instructions set given that this instructions allow a drastic speed up in certain floating point operations. Intel started to support these instructions in 2011. Although your CPU seems to be produced in 2016 it is a Celeron and as we can see here:
Not all CPUs from the listed families support AVX. Generally, CPUs with the commercial denomination "Core i3/i5/i7" support them, whereas "Pentium" and "Celeron" CPUs don't.
You can check this by running on your linux terminal lscpu | grep avx. If nothing is shown your CPU does not support these instructions.
Under these circumstances you have two options:
Build Tensorflow from source in order to compile the code given your CPU features. You can find more info here
Find a pre-built binary of Tensorflow built for CPU without AVX support. Something like this

Fail to run tensorflow on GPU

I fail to run the TF-CUDA tutorials_example_trainer as given in the installation guide (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#installing-from-sources)
I've had problems with the CUDA libs before, but that was with graphics related demo's.
All details below,
Thank you in advance for the help provided.
Environment info
Operating System: Debian Stretch
Installed version of CUDA and cuDNN:
8.0, 5.0
If installed from source, provide
554ddd9ad2d4abad5a9a31f2d245f0b1012f0d10
Build label: 0.3.0
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Jun 10 11:38:23 2016 (1465558703)
Steps to reproduce
Build from source with 367.35 driver
Run bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu
Logs or other output that would be helpful
bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
modprobe: ERROR: ../libkmod/libkmod-module.c:832 kmod_module_insert_module() could not find module by name='nvidia_367_uvm'
modprobe: ERROR: could not insert 'nvidia_367_uvm': Unknown symbol in module, or unknown parameter (see dmesg)
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: debian
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: debian
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: 367.35.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:356] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.35 Mon Jul 11 23:14:21 PDT 2016
GCC version: gcc version 5.4.0 20160609 (Debian 5.4.0-6)
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 367.35.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:293] kernel version seems to match DSO: 367.35.0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.
F tensorflow/cc/tutorials/example_trainer.cc:125] Check failed: ::tensorflow::Status::OK() == (session->Run({{"x", x}}, {"y:0", "y_normalized:0"}, {}, &outputs)) (OK vs. Invalid argument: Cannot assign a device to node 'y': Could not satisfy explicit device specification '/gpu:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
[[Node: y = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/gpu:0"](Const, x)]])

The error message indicates that your GPU driver is not well set. You could try the following command to see if the driver is installed correctly.
$ nvidia-smi
If not please follow the instruction on the CUDA official site and reinstall CUDA. As your OS is not officially supported, you may want to change your OS.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string