I'm setting up my Conda environment with a remote GPU to use Pytorch.
The GPU I use is only NVIDIA-SMI 396.54, so I can only use cuda version 9.2
However, I need to use a higher version torch to be able to use some attributes.
I tried
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=9.2
But this results in
print(torch.version.cuda)>> None
torch.cuda.is_available() >> False
There are two things I would check.
You may have unintentionally installed the pytorch cpu version or had it in your environment first, before running the above command. Even if you install the gpu version of Pytorch, if you already have the cpu version of pytorch then torch.cuda.is_available() will return False. Therefore I suggest checking out this link:
Forum on why Pytorch is CPU version even after installing cudatoolkit version
Although, I am pretty sure the above thing is your problem, I suggest looking at this second thing.
For understanding how to download previous version of Pytorch refer to this link. https://pytorch.org/get-started/previous-versions/
After looking at this, I suggest starting a new conda env and running your conda install command first.
Sarthak Jain
Related
I have installed Pytorch 1.8.1+cu102 using a virtual environment on a HPC cluster.
torch.cuda.is_available()
is giving me the below output
UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 10010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
False
What could be wrong ? I am not sure how I can update the driver. My requirements are:
torch==1.8.1+cu102
torch-cluster==1.5.9
torch-geometric==1.7.0
Firstly, you need to check which version you need for Pytorch. You can find the cuda version corresponding to Pytorch in the link below.
https://pytorch.org/get-started/previous-versions/
After you find the version, you need to check whether the version is available for your GPU device. You can find the list in the link below.
https://developer.nvidia.com/cuda-gpus
If there is no match, you need to change either pytorch requirement or your GPU device.
I'm trying to run my deep learning code in Google Colab, I have installed cuda10.0.130 and cudnn7.6.4 for tensorflow 1.14.0, but the result of tf.test.is_gpu_available() is still false, I don't know what can I do now, can somebody give me some instructions? Here is the output of !sudo lsb_release -a and !nvidia-smi
Supported and Tested configurations for GPU versions are given here in this link
Supported Version for Cuda 10.1 and Cudnn 7.6 will be tensorflow_gpu-2.3.0
Also for TF 1.X versions CPU and GPU support are different
So you should do
!pip install tensorflow_gpu==1.14.0
for using GPU version of Tensorflow
Ref- https://www.tensorflow.org/install/gpu#older_versions_of_tensorflow
I tried to train a model using PyTorch on my Macbook pro. It uses the new generation apple M1 CPU. However, PyTorch couldn't recognize my GPUs.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
Does anyone know any solution?
I have updated all the libraries to the latest versions.
PyTorch added support for M1 GPU as of 2022-05-18 in the Nightly version. Read more about it in their blog post.
Simply install nightly:
conda install pytorch -c pytorch-nightly --force-reinstall
Update: It's available in the stable version:
Conda:conda install pytorch torchvision torchaudio -c pytorch
pip: pip3 install torch torchvision torchaudio
To use (source):
mps_device = torch.device("mps")
# Create a Tensor directly on the mps device
x = torch.ones(5, device=mps_device)
# Or
x = torch.ones(5, device="mps")
# Any operation happens on the GPU
y = x * 2
# Move your model to mps just like any other device
model = YourFavoriteNet()
model.to(mps_device)
# Now every call runs on the GPU
pred = model(x)
It looks like PyTorch support for the M1 GPU is in the works, but is not yet complete.
From #soumith on GitHub:
So, here's an update.
We plan to get the M1 GPU supported. #albanD, #ezyang and a few core-devs have been looking into it. I can't confirm/deny the involvement of any other folks right now.
So, what we have so far is that we had a prototype that was just about okay. We took the wrong approach (more graph-matching-ish), and the user-experience wasn't great -- some operations were really fast, some were really slow, there wasn't a smooth experience overall. One had to guess-work which of their workflows would be fast.
So, we're completely re-writing it using a new approach, which I think is a lot closer to your good ole PyTorch, but it is going to take some time. I don't think we're going to hit a public alpha in the next ~4 months.
We will open up development of this backend as soon as we can.
That post:
https://github.com/pytorch/pytorch/issues/47702#issuecomment-965625139
TL;DR: a public beta is at least 4 months out.
For those who couldn't install using conda like me use pip as following:-
Requirement:
Any Macbook with apple silicon chip
macOS version 12.3+
Installation:
pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
Update:
No need of nightly version. Pytorch version 1.12 now supports GPU acceleration in apple silicon. Simply install using following command:-
pip3 install torch torchvision torchaudio
You may follow other instructions for using pytorch in apple silicon and getting your benchmark.
Usage:
Make sure you use mps as your device as following:
device = torch.device('mps')
# Send you tensor to GPU
my_tensor = my_tensor.to(device)
Benchmarking (on M1 Max, 10-core CPU, 24-core GPU):
Without using GPU
Using GPU (5x faster)
Platform: Precision 5820, 32G, rtx4000; Win 10 Pro, Arcgis Pro 2.6 concurrent license;
Issue:
I installed the deep learning tools following the guidelines provided here:
deeplearninginstallation
tersorflow was not found after installation so I manually installed the 2.1.0 version. I now have arcgis 1.8.2, pro 2.6, fastai 1.0.60, python 3.6.12, pytorch 1.4.0, tensorflow-gpu 2.1.0; environment check in arcgis pro python seemed fine.
However, after I select toolbox-image analyst-deeplearning-traindeeplearningmodel, the program seems to go into a hang, with most buttons disabled/unresponsive, this would continue until I force terminate the program. I also ran into "tool not licensed" twice, which was gone after I restarted the program; and a "name 'CallBackHandler' is not defined" once, which was also gone after I restarted.
I tried runing the command from the arcgis pro python prompt:
TrainDeepLearningModel(r"**", r"**", 40, "RETINANET", 16, "# #", None, "RESNET50", None, 10, "STOP_TRAINING", "FREEZE_MODEL")
executing the command would also send the program into a hang similar to the previous one. Monitor shows that ram and GPU usage haven't changed much, so I left the program running for an hour before forcibly terminating it.
I'd greatly appreciate it if anyone can tell me what the issues are here. I'll post any other env parameters if anyone requires. Cheers.
I got the tool up and running now by running conda install -c pytorch -c fastai fastai=1.0.54 pytorch=1.1.0 torchvision scikit-image and removing all the conflicting specifications in the cloned arcgispro-py3 env that I had. Now I still don't understand what went wrong. Presumably one or more packages in the env was conflicting. But seeing as I'm not a python expert, I couldn't identify the exact issue.
Before this I tried the versions stated here deeplearning install guide, but wasn't able to get pass tensorflow-gpu because python kept checking conflications. Now I actually don't have tensorflow-gpu in the env. I have tensorflow 2.1.0, keras-applications 1.0.8/base 2.3.1/preprocesing 1.1.0 (no keras-gpu), scikit-image 0.17.2, pillow 6.2.1, fastai 1.0.54, pytorch 1.1.0, libtiff 4.0.10. Some are different from what the guideline provided.
Thing is when I ran the process, CPU usage was up and GPU wasn't despite the fact that I specified GPU as the processing core. But I have much more pressing things to do right now like getting the analysis finished. So I'll probably tweek the env around a little after I'm done with this bit and see what happens. Meanwhile, anyone's input is still welcome.
I have a problem where
import torch
print(torch.cuda_is_available())
will print False, and I can't use the GPU available. I've tried it on conda environment, where I've installed the PyTorch version corresponding to the NVIDIA driver I have. I've also tried it in docker container, where I've done the same. I've tried both of these options on a remote server, but they both failed. I know that I've installed the correct driver versions because I've checked the version with nvcc --version before installing PyTorch, and I've checked the GPU connection with nvidia-smi which displays the GPUs on the machines correctly.
Also, I've checked this post and tried exporting CUDA_VISIBLE_DEVICES, but had no luck.
On the server I have NVIDIA V100 GPUs with CUDA version 10.0 (for conda environment) and version 10.2 on a docker container I've built. Any help or push in the right direction would be greatly appreciated. Thanks!
For anyone else having this problem, it turned out my server manager has not updated the drivers for the server.
I switched to a different server, installed anaconda and things started working like it should, i.e., torch.cuda.is_available() returns True after setting up a fresh environment.