How can I fix the pytorch export model error? - pytorch

1.by running the code below, you can reproduce the error
import torch
#torch.jit.script
def rotate_points_export(points):
return points
def xxx(input):
outputs = {}
outputs['a'] = input
outputs['b'] = input
outputs['b'] = rotate_points_export(outputs['b'])
return outputs
points = torch.rand((1,2,3,4))
model = torch.jit.trace(xxx, points, strict=False)
torch.jit.save(model, 'xx.pt')
The error messages are show below
model = torch.jit.trace(xxx, points, strict=False)
File "/home/intsig/codes/depedencies/hhdetection/lib/python3.8/site-packages/torch/jit/_trace.py", line 778, in trace
traced = torch._C._create_function_from_trace(
RuntimeError: values[i]->type()->isSubtypeOf(value_type) INTERNAL ASSERT FAILED at "/pytorch/torch/csrc/jit/ir/ir.cpp":1650, please report a bug to PyTorch.
The environ infos are show below
Collecting environment information...
PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.13.0-52-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.1.74
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Ti
Nvidia driver version: 470.129.06
cuDNN version: Probably one of the following:
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.21.1
[pip3] torch==1.7.1
[pip3] torchvision==0.8.2
[conda] blas 1.0 mkl
[conda] mkl 2021.2.0 h06a4308_296
[conda] mkl-service 2.3.0 py38h27cfd23_1
[conda] mkl_fft 1.3.0 py38h42c9631_2
[conda] mkl_random 1.2.1 py38ha9443f7_2
[conda] numpy 1.20.1 py38h93e21f0_0
[conda] numpy-base 1.20.1 py38h7d8b39e_0
[conda] numpydoc 1.1.0 pyhd3eb1b0_1

Related

Colab local runtime crashing when running keras model

I am new to using local runtime on colab, but was able to import and run all other kernels except for the one that specifies a keras sequential model. Here's the code:
keras.backend.clear_session()
model = keras.Sequential()
I receive this message when it crashes:
Your session crashed for an unknown reason.
When I try to access runtime logs, I get the following:
Could not fetch /var/colab/app.log from backend
Could not fetch resource at : 404 Not Found
FetchError: Could not fetch resource at : 404 Not Found
at XB.$A [as constructor] (https://ssl.gstatic.com/colaboratory-static/common/0c687632c6df07e69ab0173b89376c36/external_polymer_binary.js:1408:2530)
at new XB (https://ssl.gstatic.com/colaboratory-static/common/0c687632c6df07e69ab0173b89376c36/external_polymer_binary.js:1429:190)
at xa.program_ (https://ssl.gstatic.com/colaboratory-static/common/0c687632c6df07e69ab0173b89376c36/external_polymer_binary.js:6276:129)
at za (https://ssl.gstatic.com/colaboratory-static/common/0c687632c6df07e69ab0173b89376c36/external_polymer_binary.js:20:336)
at xa.next_ (https://ssl.gstatic.com/colaboratory-static/common/0c687632c6df07e69ab0173b89376c36/external_polymer_binary.js:18:508)
at Aa.next (https://ssl.gstatic.com/colaboratory-static/common/0c687632c6df07e69ab0173b89376c36/external_polymer_binary.js:21:206)
at b (https://ssl.gstatic.com/colaboratory-static/common/0c687632c6df07e69ab0173b89376c36/external_polymer_binary.js:21:468)
I believe I haven't set up local runtime correctly, as when I switch to hosted runtime, it runs just fine. Any help would be much appreciated! I am running this on a Mac M1 Pro, Monterey 12.5.
% conda list tensorflow
#
# Name Version Build Channel
tensorflow-deps 2.9.0 0 apple
tensorflow-estimator 2.9.0 pypi_0 pypi
tensorflow-macos 2.9.0 pypi_0 pypi
tensorflow-metal 0.6.0 pypi_0 pypi
% conda list keras
#
# Name Version Build Channel
keras 2.9.0 pypi_0 pypi
keras-preprocessing 1.1.2 pypi_0 pypi

Problems getting GPU to work with older version of Tensorflow and Keras

I have a project I'm trying to work on but it's based on code that's a few years old and for whatever reason this code tends to fail if Tensorflow or NumPy aren't the correct versions (which means everything I'm using has to be old). This has meant that I've needed to dual-install an older version of Python to then be able to install the correct versions of the dependencies.
I'm running:
Python 3.7.5
NumPy 1.17.4
Pandas 0.25.3
pyyaml 5.1.2
more_itertools 7.2.0
keras 2.3.1
tensorflow 2.0.1
CUDA 10.0
CuDNN 7.4.1
I'm particularly interested in the keras and tensorflow versions. From my research, it seems they should work with GPU (as is?) according to this:
https://www.tensorflow.org/install/source (towards the bottom under tested build configurations for GPU).
However, when I try to detect GPU devices on my build with
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
I get
2022-03-22 19:34:53.410102: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 1722158755601489749
]
Which seems like it isn't recognising my GPU.
Is there something I'm missing in the setup process that I need to enable GPU support? As far as I can tell my versions of Tensorflow and Keras are compatible with GPU processing and have compatible versions of CUDA and CuDNN installed.

import xarray returns 'No protocol specified'

I have recently updated my xarray, but now am running into an error when I import it:
$ python3
Python 3.8.10 (default, Jun 2 2021, 10:49:15)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>import xarray
No protocol specified
>>>
What does this mean? Does it have something to do with dependencies on something else? I upgraded using pip3. I do not own this computer, otherwise I would have installed everything with conda. Does pip3 also sort out dependency issues? I heard conda does.. and maybe I should switch over, but I don't want to create a conflict with the other users on the computer.
Maybe this information is also useful:
>>> import xarray
No protocol specified
>>> xarray.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.10 (default, Jun 2 2021, 10:49:15)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-70-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.4
libnetcdf: 4.7.3
xarray: 0.19.0
pandas: 1.3.2
numpy: 1.17.4
scipy: 1.3.3
netCDF4: 1.5.3
pydap: None
h5netcdf: 0.7.1
h5py: 2.10.0
Nio: None
zarr: 2.4.0+ds
cftime: 1.1.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.8.1+dfsg
distributed: None
matplotlib: 3.1.2
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: None
setuptools: 45.2.0
pip: 20.0.2
conda: None
pytest: 4.6.9
IPython: 7.13.0
sphinx: 1.8.5
>>>
This error reflects a problem with how your computer's display is configured, not an error with Xarray.
This answer seems to address the No protocol specified message directly.

pytorch cuda not available: CUDA initialization: CUDA unknown error

I successfully ran pytorch, however after a system reboot I get the following error calling torch.cuda.is_available():
UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at /opt/conda/conda-bld/pytorch_1616554782469/work/c10/cuda/CUDAFunctions.cpp:109.)
Output of nvidia-smi:
nvidia-smi
Thu Jun 24 09:11:39 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 00000000:00:04.0 Off | 0 |
| N/A 40C P0 26W / 250W | 0MiB / 16280MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Environment information:
python collect_env.py
Collecting environment information...
/lib/python3.9/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at /opt/conda/conda-bld/pytorch_1616554782469/work/c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
PyTorch version: 1.8.1
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Debian GNU/Linux 10 (buster) (x86_64)
GCC version: (Debian 8.3.0-6) 8.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.28
Python version: 3.9 (64-bit runtime)
Python platform: Linux-4.19.0-17-cloud-amd64-x86_64-with-glibc2.28
Is CUDA available: False
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Tesla P100-PCIE-16GB
Nvidia driver version: 455.23.05
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.8.1
[pip3] torchaudio==0.8.0a0+e4e171a
[pip3] torchmetrics==0.3.2
[pip3] torchvision==0.9.1
[conda] _tflow_select 2.3.0 mkl
[conda] blas 1.0 mkl conda-forge
[conda] cudatoolkit 11.1.74 h6bb024c_0 nvidia
[conda] mkl 2020.2 256
[conda] mkl-service 2.3.0 py39he8ac12f_0
[conda] mkl_fft 1.3.0 py39h54f3939_0
[conda] mkl_random 1.0.2 py39h63df603_0
[conda] numpy 1.19.2 py39h89c1606_0
[conda] numpy-base 1.19.2 py39h2ae0177_0
[conda] pytorch 1.8.1 py3.9_cuda11.1_cudnn8.0.5_0 pytorch
[conda] tensorflow 2.4.1 mkl_py39h4683426_0
[conda] tensorflow-base 2.4.1 mkl_py39h43e0292_0
[conda] torchaudio 0.8.1 py39 pytorch
[conda] torchmetrics 0.3.2 pyhd8ed1ab_0 conda-forge
[conda] torchvision 0.9.1 py39_cu111 pytorch
I've recently ran into this error when migrating my gpu containers from nvidia docker to podman. The root cause for me was that /dev/nvidia-uvm* files were missing that CUDA apparently needs to work. Check that you have them:
# ls -ld /dev/nvidia*
drwxr-x--- 2 root root 80 Oct 6 21:11 /dev/nvidia-caps
crw-rw-rw- 1 root root 195, 254 Oct 6 21:08 /dev/nvidia-modeset
crw-rw-rw- 1 root root 237, 0 Oct 6 21:13 /dev/nvidia-uvm <-IMPORTANT
crw-rw-rw- 1 root root 237, 1 Oct 6 21:13 /dev/nvidia-uvm-tools <-IMPORTANT
crw-rw-rw- 1 root root 195, 0 Oct 6 21:08 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Oct 6 21:08 /dev/nvidiactl
sudo nvidia-modprobe -c 0 -u should load kernel modules and create these dev files if you don't see them. Or alternatively look for /sbin/create-uvm-dev-node script from ubuntu that they created to fix their issue with them.
If you use GPU inside a container/VM, these dev files also need to be present in the container. Normally nvidia runtime scripts will take care of passing them through. If that doesn't happen you could try passing some explicit --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools to docker/podman run.

PyTorch and CUDA driver

I have CUDA 9.2 installed. For example:
(base) c:\>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Wed_Apr_11_23:16:30_Central_Daylight_Time_2018
Cuda compilation tools, release 9.2, V9.2.88
I installed PyTorch on Windows 10 using:
conda install pytorch cuda92 -c pytorch
pip3 install torchvision
I ran the test script:
(base) c:\>python
Python 3.6.5 |Anaconda custom (64-bit)| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from __future__ import print_function
>>> import torch
>>> x = torch.rand(5, 3)
>>> print(x)
tensor([[0.7041, 0.5685, 0.4036],
[0.3089, 0.5286, 0.3245],
[0.3504, 0.8638, 0.1118],
[0.6517, 0.9209, 0.6801],
[0.0315, 0.1923, 0.8720]])
>>> quit()
So for, so good. Then I ran:
(base) c:\>python
Python 3.6.5 |Anaconda custom (64-bit)| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
False
>>>
Why did PyTorch say CUDA was not available?
The GPU is a compute capability 3.0 Quadro K3000M:
(base) C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi.exe
Mon Oct 01 16:36:47 2018
NVIDIA-SMI 385.54 Driver Version: 385.54
-------------------------------+----------------------+----------------------
GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr.
ECC Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M.
0 Quadro K3000M WDDM | 00000000:01:00.0 Off |
N/A N/A 35C P0 N/A / N/A | 29MiB / 2048MiB | 0% Default
Ever since https://github.com/pytorch/pytorch/releases/tag/v0.3.1, PyTorch binary releases had removed support for old GPUs' with CUDA capability 3.0. According to https://en.wikipedia.org/wiki/CUDA, the compute capability of Quadro K3000M is 3.0.
Therefore, you might have to build pytorch from source or try other packages. Please refer to this thread for more information -- https://discuss.pytorch.org/t/pytorch-no-longer-supports-this-gpu-because-it-is-too-old/13803.
PyTorch official call for using CUDA 9.0 and I would suggest the same. In other cases, there are sometimes build issues which leads to 'CUDA not detected'.So, when using PyTorch its best to use CUDA 9.0 and CuDnn 7. I'll add a link where you can easily install Cuda 9.0 and CuDnn 7.
https://yangcha.github.io/CUDA90/
I had a similar problem, you have to check on NVIDIA control panel that your card is selected by default.

Resources