THC/THC.h: No such file or directory - linux

I am trying to compile this with cuda support: https://github.com/CharlesShang/DCNv2 for a Project. But everytime I try it gives me this error message:
/THC.h: No such file or directory
9 | #include <THC/THC.h>
I am using:
Arch Linux with kernel version 6.1.4
GTX 1080
python 3.6
pytorch 1.2.0
torchvision 0.4.0
cudatoolkit 10.0
gcc 7.5
I thought it might be incompatible cuda and gcc versions, but I tried multiple combinations and none of them worked. At the moment I am using cuda 10.0 with gcc version 7.5 as it should be compatible.
Any help is greatly appreciated.

Related

Using CUDA 11.x but getting error: Unknown CUDA arch (8.6) or GPU not supported

I'm setting up a conda environment to use pytorch 1.4.0 (on Ubuntu 20.04.2), but getting the error message:
ValueError: Unknown CUDA arch (8.6) or GPU not supported
I know this has been asked before, but no answer fits my case. This answer suggests that the CUDA version is too old. However, I updated my CUDA version to the most recent, and get the same error message.
nvcc -V says I have CUDA 11 installed, and when I run nvidia-smi I get this info:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.84 Driver Version: 460.84 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
which, according to the NVIDIA docs, should work be compatible:
Another auxilliary question: What does the "8.6" in CUDA arch (8.6) represent?
Specific versions of PyTorch work only with specific versions of CUDA.
If you are using CUDA-11.1, you'll need a fairly recent version of PyTorch. You need to either upgrade your PyTorch, or downgrade your CUDA.
It seems you can grab PyTorch v1.4 for CUDA 10.0 from here:
pip install torch==1.4.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html

Tensorflow 2.2 and cudnn 8.0.3 not working together as they should. It still looks for cudnn 7.6.5 dll files

I have Tensorflow 2.2 and Cuda 10.1 with cuDnn 8.0.3
I am unable to run my scripts because it keeps looking for cuDnn 7 dll file: cudnn64_7.dll
I get the following:
Could not load dynamic library 'cudnn64_7.dll'; dlerror: cudnn64_7.dll not found
Even though I installed the newly published cuDnn 8.0.3 for Cuda 10.1 (see cuDNN 8.x support matrix)
I went back to cuDNN 7.6.5 but I was hoping to get the "5 times faster" cuDNN v8.0 as NVIDIA claims.
Any help or workarounds on how to get this done? Googling gets me literally less than 5 results! as it seems not many got to try the new 8.0.3 (the one for 10.1)
Had the same issue. 8.0.3 version is the current and latest supported version of the library for CUDA 10.1. However, tensorflow is build for the earlier version, so you have to use that instead.
To elaborate, if you check this page: https://www.tensorflow.org/install/source_windows#tested_build_configurations
+----------------------+----------------+-----------+-------------+-------+------+
| Version | Python version | Compiler | Build tools | cuDNN | CUDA |
+----------------------+----------------+-----------+-------------+-------+------+
| tensorflow_gpu-2.3.0 | 3.5-3.8 | MSVC 2019 | Bazel 3.1.0 | 7.6 | 10.1 |
+----------------------+----------------+-----------+-------------+-------+------+
so, unless you build the TF locally - you have to use the supported version of cudnn.
That being said, however, if you check latest TF releases:
https://github.com/tensorflow/tensorflow/releases
you will then see the following TensorFlow 2.4.0-rc1 note:
TensorFlow pip packages are now built with CUDA11 and cuDNN 8.0.2.
You can use the release candidate version of TF, but then, you also have to upgrade CUDA to 11 (I am guessing version 11.0 since no postfix is mentioned) and use the cuDNN v8.0.2 (July 24th, 2020), for CUDA 11.0.
Just tested - this setup works. You just have to make sure to install numpy version 1.19.3 in order to avoid the problem mentioned in these threads
RuntimeError: The current Numpy installation fails to pass a sanity check due to a bug in the windows runtime
https://developercommunity.visualstudio.com/content/problem/1207405/fmod-after-an-update-to-windows-2004-is-causing-a.html

Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found

Installed Nvidia CUDA 11
Got the cuDNN 8.0 (I think)
Added the directory to PATH
Installed TensorFlow through (pip install tensorflow-gpu)
But I still get this error
Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
TF 2.4 supports CUDA 11
TF 2.3 needs CUDA 10.1
Just install CUDA 10.1 (you can have more than one instalation)

How to compile python3.6 program for Redhat 5.8/CentOS 5?

I have compiled the python3.6 program on CentOS 6.8 using pyinstaller and tested on a newer version of Linux. It's working as expected. CentOS 6.8 has installed GLIBC 2.12
pyinstaller --onefile --clean --hidden-import sqlite3 --hidden-import pycryptodome my_python.py
However, I'm getting the follwing error when execute the compiled program on Redhat 5.8 as it has installed the GLIBC 2.5
[24522] Error loading Python lib '/tmp/_MEIl16Rvq/libpython3.6m.so.1.0': dlopen: /lib64/libc.so.6: version `GLIBC_2.7' not found (required by /tmp/_MEIl16Rvq/libpython3.6m.so.1.0)
Can you please help me how to compile the python3.6 program on CentOS 6 for Redhat 5.8?
P.S: I cannot update the GLIBC as I'm going to distribute the same program to the many Linux servers.
Answer to this question is listed in pyinstaller's FAQ as first one in GNU/Linux section. Here it is, a bit cut down version with my emphasis.
The executable that PyInstaller builds is not fully static, in that it still depends on the system libc. Under Linux, the ABI of GLIBC is [...] not forward compatible. [...] The supplied binary bootloader should work with older GLIBC. However, the libpython.so and other dynamic libraries still depends on the newer GLIBC. The solution is to compile the Python interpreter with its modules (and also probably bootloader) on the oldest system you have around, so that it gets linked with the oldest version of GLIBC.

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

I have installed cuda-8.0 and cudnn5.1 on CentOS. Then, when importing tensorflow (python 3.6), it gives the error as above.
I have already set symbol link as below in /etc/profile. Are there any guys who occurred this kind of problem?
export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64:$LD_LIBRARY_PATH
Also, what makes me confused is that, I run nvcc -V, it shows
Cuda compilation tools, release 8.0, V8.0.61
However, when I run ./deviceQuery in folder /usr/local/cuda-8.0/samples/1_Utilities/deviceQuery, on device 0: "Tesla M40", it shows
CUDA Driver Version / Runtime Version 9.1 / 8.0
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = Tesla M40
Check your version of tensorflow using "pip3 list | grep tensorflow" If it is of version tensorflow-gpu (1.5.0) then the required cuda version is 9.0 and cuDNN v7.
Look into the following link for more details:
https://github.com/tensorflow/tensorflow/releases
Tensorflow installation guide needs to be updated.
I had the same problem. Tensorflow 1.5.0 is precompiled to CUDA 9.0 (which is outdated; Sept 2017).
The newest CUDA version is CUDA 9.1 (Dec. 2017) and sudo pip install tensorflow-gpu will not work with the newest CUDA 9.1. There are two solutions to the problem:
1.) Install CUDA 9.0 next to CUDA 9.1 (this worked for me)
2.) Build Tensorflow by yourself from the git source code
Either way do not forget to add the PATH variables to your operating system, otherwise you receive the error message stated in the question from your python interpreter.

Resources