PyTorch and CUDA driver

PyTorch and CUDA driver - python-3.x

I have CUDA 9.2 installed. For example:
(base) c:\>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Wed_Apr_11_23:16:30_Central_Daylight_Time_2018
Cuda compilation tools, release 9.2, V9.2.88
I installed PyTorch on Windows 10 using:
conda install pytorch cuda92 -c pytorch
pip3 install torchvision
I ran the test script:
(base) c:\>python
Python 3.6.5 |Anaconda custom (64-bit)| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from __future__ import print_function
>>> import torch
>>> x = torch.rand(5, 3)
>>> print(x)
tensor([[0.7041, 0.5685, 0.4036],
[0.3089, 0.5286, 0.3245],
[0.3504, 0.8638, 0.1118],
[0.6517, 0.9209, 0.6801],
[0.0315, 0.1923, 0.8720]])
>>> quit()
So for, so good. Then I ran:
(base) c:\>python
Python 3.6.5 |Anaconda custom (64-bit)| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
False
>>>
Why did PyTorch say CUDA was not available?
The GPU is a compute capability 3.0 Quadro K3000M:
(base) C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi.exe
Mon Oct 01 16:36:47 2018
NVIDIA-SMI 385.54 Driver Version: 385.54
-------------------------------+----------------------+----------------------
GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr.
ECC Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M.
0 Quadro K3000M WDDM | 00000000:01:00.0 Off |
N/A N/A 35C P0 N/A / N/A | 29MiB / 2048MiB | 0% Default

Ever since https://github.com/pytorch/pytorch/releases/tag/v0.3.1, PyTorch binary releases had removed support for old GPUs' with CUDA capability 3.0. According to https://en.wikipedia.org/wiki/CUDA, the compute capability of Quadro K3000M is 3.0.
Therefore, you might have to build pytorch from source or try other packages. Please refer to this thread for more information -- https://discuss.pytorch.org/t/pytorch-no-longer-supports-this-gpu-because-it-is-too-old/13803.

PyTorch official call for using CUDA 9.0 and I would suggest the same. In other cases, there are sometimes build issues which leads to 'CUDA not detected'.So, when using PyTorch its best to use CUDA 9.0 and CuDnn 7. I'll add a link where you can easily install Cuda 9.0 and CuDnn 7.
https://yangcha.github.io/CUDA90/

I had a similar problem, you have to check on NVIDIA control panel that your card is selected by default.

Related

import xarray returns 'No protocol specified'

I have recently updated my xarray, but now am running into an error when I import it:
$ python3
Python 3.8.10 (default, Jun 2 2021, 10:49:15)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>import xarray
No protocol specified
>>>
What does this mean? Does it have something to do with dependencies on something else? I upgraded using pip3. I do not own this computer, otherwise I would have installed everything with conda. Does pip3 also sort out dependency issues? I heard conda does.. and maybe I should switch over, but I don't want to create a conflict with the other users on the computer.
Maybe this information is also useful:
>>> import xarray
No protocol specified
>>> xarray.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.10 (default, Jun 2 2021, 10:49:15)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-70-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.4
libnetcdf: 4.7.3
xarray: 0.19.0
pandas: 1.3.2
numpy: 1.17.4
scipy: 1.3.3
netCDF4: 1.5.3
pydap: None
h5netcdf: 0.7.1
h5py: 2.10.0
Nio: None
zarr: 2.4.0+ds
cftime: 1.1.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.8.1+dfsg
distributed: None
matplotlib: 3.1.2
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: None
setuptools: 45.2.0
pip: 20.0.2
conda: None
pytest: 4.6.9
IPython: 7.13.0
sphinx: 1.8.5
>>>

This error reflects a problem with how your computer's display is configured, not an error with Xarray.
This answer seems to address the No protocol specified message directly.

pytorch cuda not available: CUDA initialization: CUDA unknown error

I successfully ran pytorch, however after a system reboot I get the following error calling torch.cuda.is_available():
UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at /opt/conda/conda-bld/pytorch_1616554782469/work/c10/cuda/CUDAFunctions.cpp:109.)
Output of nvidia-smi:
nvidia-smi
Thu Jun 24 09:11:39 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 00000000:00:04.0 Off | 0 |
| N/A 40C P0 26W / 250W | 0MiB / 16280MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Environment information:
python collect_env.py
Collecting environment information...
/lib/python3.9/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at /opt/conda/conda-bld/pytorch_1616554782469/work/c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
PyTorch version: 1.8.1
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Debian GNU/Linux 10 (buster) (x86_64)
GCC version: (Debian 8.3.0-6) 8.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.28
Python version: 3.9 (64-bit runtime)
Python platform: Linux-4.19.0-17-cloud-amd64-x86_64-with-glibc2.28
Is CUDA available: False
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Tesla P100-PCIE-16GB
Nvidia driver version: 455.23.05
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.8.1
[pip3] torchaudio==0.8.0a0+e4e171a
[pip3] torchmetrics==0.3.2
[pip3] torchvision==0.9.1
[conda] _tflow_select 2.3.0 mkl
[conda] blas 1.0 mkl conda-forge
[conda] cudatoolkit 11.1.74 h6bb024c_0 nvidia
[conda] mkl 2020.2 256
[conda] mkl-service 2.3.0 py39he8ac12f_0
[conda] mkl_fft 1.3.0 py39h54f3939_0
[conda] mkl_random 1.0.2 py39h63df603_0
[conda] numpy 1.19.2 py39h89c1606_0
[conda] numpy-base 1.19.2 py39h2ae0177_0
[conda] pytorch 1.8.1 py3.9_cuda11.1_cudnn8.0.5_0 pytorch
[conda] tensorflow 2.4.1 mkl_py39h4683426_0
[conda] tensorflow-base 2.4.1 mkl_py39h43e0292_0
[conda] torchaudio 0.8.1 py39 pytorch
[conda] torchmetrics 0.3.2 pyhd8ed1ab_0 conda-forge
[conda] torchvision 0.9.1 py39_cu111 pytorch

I've recently ran into this error when migrating my gpu containers from nvidia docker to podman. The root cause for me was that /dev/nvidia-uvm* files were missing that CUDA apparently needs to work. Check that you have them:
# ls -ld /dev/nvidia*
drwxr-x--- 2 root root 80 Oct 6 21:11 /dev/nvidia-caps
crw-rw-rw- 1 root root 195, 254 Oct 6 21:08 /dev/nvidia-modeset
crw-rw-rw- 1 root root 237, 0 Oct 6 21:13 /dev/nvidia-uvm <-IMPORTANT
crw-rw-rw- 1 root root 237, 1 Oct 6 21:13 /dev/nvidia-uvm-tools <-IMPORTANT
crw-rw-rw- 1 root root 195, 0 Oct 6 21:08 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Oct 6 21:08 /dev/nvidiactl
sudo nvidia-modprobe -c 0 -u should load kernel modules and create these dev files if you don't see them. Or alternatively look for /sbin/create-uvm-dev-node script from ubuntu that they created to fix their issue with them.
If you use GPU inside a container/VM, these dev files also need to be present in the container. Normally nvidia runtime scripts will take care of passing them through. If that doesn't happen you could try passing some explicit --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools to docker/podman run.

PyOpenCL 2020.1 - Device side queue is unimplemented

I am facing an issue with pyopencl I never had before (see piece of code below)
the issue : Device side queue is unimplemented (clCreateCommandQueueWithProperties.c:93)
Have any of you faced this problem before ?
Do you have any idea wher it comes from ?
thanks in advance !
user#debian_9.5:~# pip3 freeze | grep pyopencl
pyopencl==2020.1
user#debian_9.5:~# python3
Python 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyopencl as cl
>>> ctx = cl.create_some_context()
Choose platform:
[0] <pyopencl.Platform 'Portable Computing Language' at 0x7fb1fbfba020>
Choice [0]:0
Set the environment variable PYOPENCL_CTX='0' to avoid being asked again.
>>> print(ctx)
<pyopencl.Context at 0x55c1e9a87440 on <pyopencl.Device 'pthread-AMD Ryzen Threadripper 1950X 16-Core Processor' on 'Portable Computing Language' at 0x55c1ea017430>>
>>> queue = cl.CommandQueue(ctx)
Device side queue is unimplemented (clCreateCommandQueueWithProperties.c:93)
my setup is as folow : AMD-ryzen_1950X - debian_9.5 - python_3.5.3

solved doing :
user#debian_9.5:~# pip uninstall pyopencl && apt install python3-pyopencl

Conda environment wrong pyyaml version

conda list pyyaml
# packages in environment at c:\Anaconda3:
#
# Name Version Build Channel
pyyaml 3.13 py36hfa6e2cd_1001 conda-forge
conda env list
# conda environments:
#
C:\Anaconda3
base c:\Anaconda3
yaml * c:\Anaconda3\envs\yaml
Switching to yaml environment
activiate yaml
conda list pyyaml
# packages in environment at c:\Anaconda3\envs\yaml:
#
# Name Version Build Channel
pyyaml 5.2 py36he774522_0
Starting Python within environment yaml with python
Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 14:00:49) [MSC v.1915 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import yaml
>>> yaml.__version__
'3.13'
Why is 3.13 and not 5.2 returned?
Update 2019-12-17, 14:32
(base) D:\a\buch>
(base) D:\a\buch>conda activate yaml
(yaml) D:\a\buch>
(yaml) D:\a\buch>python
Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 14:00:49) [MSC v.1915 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import yaml
>>> yaml.__version__
'3.13'
>>>
(yaml) D:\a\buch>c:\Anaconda3\envs\yaml\python.exe
Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 14:00:49) [MSC v.1915 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import yaml
>>> yaml.__version__
'3.13'
>>>

That is because in anaconda by default you will be in the base environment where the version of pyyaml is 3.13(in your case).
If you activate the environment yaml and check the version of pyyaml, you will find the 5.2 version of pyyaml
The problem in your case is that you are in the same base environment. Kindly use the following command to switch to the other environment.
conda activate yaml
You forgot to use the conda command before activate yaml hence you are in the same base environment.
Kindly follow this link also.
Hope this will help you.

Using Theano and Lasagne with FIPS Enabled

I have the following:
Python 2.7.5
RHEL 7.3 with FIPS enabled
Lasagne (0.2.dev1)
Theano (0.9.0)
I installed Theano and Lasange with pip without issue, but when I import lasange I receive an error related to FIPS:
$: python
Python 2.7.5 (default, Aug 2 2016, 04:20:16)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lasagne
ERROR (theano.sandbox.cuda): Failed to compile cuda_ndarray.cu: error:060800A3:digital envelope routines:EVP_DigestInit_ex:disabled for fips
Is there some workaround known or available? Unfortunately I have to have FIPS enabled.
I'm just starting out with Theano and Lasagne so I apologize if I need additional help to troubleshoot.

As of now, it looks like md5 hashing is hardcoded into the library and has been acknowledged by Theano developers: https://github.com/Theano/Theano/issues/5757
Update: May 25, 2017
I have modified the theano code so it uses sha256 instead of md5. This has resolved the FIPS issues I've been experiencing and has not slowed down any computation I've been running.
You can review the pull request here: https://github.com/Theano/Theano/pull/5916
and you can download my changes here until it is merged: https://github.com/dareneiri/Theano , if indeed it is accepted.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

PyTorch and CUDA driver - python-3.x

I had a similar problem, you have to check on NVIDIA control panel that your card is selected by default.

Related

import xarray returns 'No protocol specified'

pytorch cuda not available: CUDA initialization: CUDA unknown error

PyOpenCL 2020.1 - Device side queue is unimplemented

Conda environment wrong pyyaml version

Using Theano and Lasagne with FIPS Enabled

Categories

Resources