Not compile with GPU support in detectron2 - pytorch

enter image description here
Detectron2 ran faster- RCNN when the error, from the error, should be the network RPN part caused the error.
The GPU should be running because the backbone part did not report an error.
How can I solve this problem?

The reason for this error is the server cuda version with pytorch
Cuda version mismatches, such as between 10.1 and 10.0. So you should check whether the pytorch cuda version is the same as the machine cuda version

"nvcc not found" or "Not compiled with GPU support" or "Detectron2 CUDA Compiler: not available".
CUDA is not found when building detectron2.
You should make sure
import torch
from torch.utils.cpp_extension import CUDA_HOME
print(torch.cuda.is_available(), CUDA_HOME)
should return
print (True, a directory with cuda) at the time you build detectron2.
Most models can run inference (but not training) without GPU support. To use CPUs, set MODEL.DEVICE='cpu' in the config.
and for in-depth solution you may visit this link:
https://detectron2.readthedocs.io/en/latest/tutorials/install.html#common-installation-issues

Related

torchaudio pre-built wheel for aarch64 on NVIDIA GPU platform

NVIDIA is offering pre-build wheels for Pytorch leveraging the GPU under 1. But this does not cover audiotorch. It seems that it is necessary to use a matching audiotorch version, otherwise and the version having the same version number gives:
OSError: ../lib/libtorchaudio.so: undefined symbol: _ZNK5torch8autograd4Node4nameEv
Where can this pre-built wheel be found? I do not what to use an offered NVIDIA docker image for JetPack where this might work since this makes no sense on a small AI computer.

Stable Baselines PPO algorithm crashes due to RuntimeError: Calling torch.geqrf on a CPU tensor requires compiling PyTorch with LAPACK

I tried to run a PPO algorithm from the stable-baselines3 library on a basic gym environment on my local CPU, however I get the following RuntimeError:
RuntimeError: Calling torch.geqrf on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support.
I'm using a conda environment on a Windows machine, with following installations:
pytorch 1.12.1 cpu_py39h5e1f01c_0
lapack 3.9.0 netlib conda-forge
Since I'm quite new to Python I have no clue how to resolve this issue and web search didn't give any proper instructions on that specific problem.
I tried uninstalling PyTorch in my anaconda prompt, however this would remove a lot of packages which scared me of breaking something. Hence, I'm lost what else to do in order to build PyTorch with Lapack support...
Any help would be appreciated, Cheers.

Using CUDA 11.x but getting error: Unknown CUDA arch (8.6) or GPU not supported

I'm setting up a conda environment to use pytorch 1.4.0 (on Ubuntu 20.04.2), but getting the error message:
ValueError: Unknown CUDA arch (8.6) or GPU not supported
I know this has been asked before, but no answer fits my case. This answer suggests that the CUDA version is too old. However, I updated my CUDA version to the most recent, and get the same error message.
nvcc -V says I have CUDA 11 installed, and when I run nvidia-smi I get this info:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.84 Driver Version: 460.84 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
which, according to the NVIDIA docs, should work be compatible:
Another auxilliary question: What does the "8.6" in CUDA arch (8.6) represent?
Specific versions of PyTorch work only with specific versions of CUDA.
If you are using CUDA-11.1, you'll need a fairly recent version of PyTorch. You need to either upgrade your PyTorch, or downgrade your CUDA.
It seems you can grab PyTorch v1.4 for CUDA 10.0 from here:
pip install torch==1.4.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html

Pytorch list of supported GPU hardware for each release (Ubuntu18.04)

I have an old GPU and pytorch says it is too old to support:
Found GPU0 GeForce GTX 670 which is of cuda capability 3.0.
PyTorch no longer supports this GPU because it is too old.
The minimum cuda capability that we support is 3.5.
First question is what is this 3.0 / 3.5 referring to ? Clearly not cuda, nvidia driver nor pytorch?
Secondly, I know I can build Pytorch from source in order to have more support, however it's unclear what hardware pytorch supports when built from source, or even when installed with pip. I'm at the point where the likely pytorch version I need, requires python2, and cuda 9, so I'm spending solid time changing a bunch of software versions without knowing whether any of it will work.
Also if I were to get a newer GPU, I don't know whether it is currently supported.
Any way to have a list of supported NVIDIA hardware of pytorch? Ideally for each releases, but at least for current release?
I run an Nvidia GTX670 with Nvidia Driver 430.50. I changed from cuda 10 to 9, but seeing I might have to build pytorch 0.3.1, I think cuda8 is needed since when runnin python setup.py install in v0.3.1 i get:
CMakeFiles/THC.dir/build.make:560: recipe for target 'CMakeFiles/THC.dir/THC_generated_THCHalf.cu.o' failed
make[2]: *** [CMakeFiles/THC.dir/THC_generated_THCHalf.cu.o] Error 1
CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/THC.dir/all' failed
make[1]: *** [CMakeFiles/THC.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2

Bazel returns a rule linking error when building Tensorflow from source

I'm trying to get the InfoGAN code from GitHub to run, but when I try to build Tensorflow from the recommended source code, an error keeps on appearing. It runs fine for about 30 minutes and then crashes (even when sudo is used).
The error is as follows:
ERROR: /home/socialab/Desktop/tensorflow-master/tensorflow/python/BUILD:2436:1: Linking of rule '//tensorflow/python:gen_stateless_random_ops_py_wrappers_cc' failed (Exit 1)
bazel-out/host/bin/tensorflow/core/libop_gen_lib.a(op_gen_lib.o): In function `google::protobuf::internal::ArenaStringPtr::CreateInstance(google::protobuf::Arena*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*)':
op_gen_lib.cc:(.text._ZN6google8protobuf8internal14ArenaStringPtr14CreateInstanceEPNS0_5ArenaEPKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE[_ZN6google8protobuf8internal14ArenaStringPtr14CreateInstanceEPNS0_5ArenaEPKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE]+0x36): undefined reference to `google::protobuf::internal::ArenaImpl::AllocateAlignedAndAddCleanup(unsigned long, void (*)(void*))'
When configuring Tensorflow I say no to almost everything (except for CUDA), like so:
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.29.1 installed.
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3
Found possible Python library paths:
/usr/lib/python3/dist-packages
/usr/local/lib/python3.6/dist-packages
Please input the desired Python library path to use. Default is [/usr/lib/python3/dist-packages]
/usr/local/lib/python3.6/dist-packages
Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Do you wish to build TensorFlow with TensorRT support? [y/N]: n
No TensorRT support will be enabled for TensorFlow.
Found CUDA 10.1 in:
/usr/local/cuda/lib64
/usr/local/cuda/include
Found cuDNN 7 in:
/usr/lib/x86_64-linux-gnu
/usr/include
Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 7.5]: 7.5
Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=ngraph # Build with Intel nGraph support.
--config=numa # Build with NUMA support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
--config=v2 # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
--config=noaws # Disable AWS S3 filesystem support.
--config=nogcp # Disable GCP support.
--config=nohdfs # Disable HDFS support.
--config=nonccl # Disable NVIDIA NCCL support.
Configuration finished
I'm using the Bazel version that was recommended (0.29.1), CUDA 10.1, cudNN 7.6.4 and the 430.50 drivers.
System specs:
Ubuntu 18.04
RTX 2080 Ti
i5-8600K
8GB RAM
Any help is much appreciated. Thanks!

Resources