AzureML SDK not working with PyTorch 1.5?

AzureML SDK not working with PyTorch 1.5? - pytorch

Has anyone got PyTorch 1.5 to work with the AzureML SDK (versions 1.11 and 1.12)? torch.cuda.is_available() returns False even on GPU-enabled machines. Exactly the same setup works fine (is_available() is True) with PyTorch 1.3, 1.4 and 1.6. Any pointers welcome. These are the (possibly) relevant parts of my Conda environment file, with the values of pytorch and azureml-sdk varied as required.
channels:
- defaults
- pytorch
dependencies:
- python=3.7.3
- pytorch=1.5.0
- pip:
- azureml-sdk==1.12.0
Thanks

This is a known issue with PyTorch 1.5 and CUDA and is acknowledged by PyTorch in this GitHub issue.
They haven't provided an official solution to the issue, but they recommend either updating old GPU-drivers or making sure you have a CPU-enabled version of PyTorch installed. Since you're not experiencing this problem with other PyTorch versions on AzureML GPUs, GPU drivers don't seem to be the issue, so it's probably the PyTorch installation.
Try installing "torchvision==0.6.0" along with your pytorch=1.5.0. PyTorch's site encourages pairing 1.5.0 with torchvision 0.6.0: https://pytorch.org/get-started/previous-versions/

Related

Tensorflow GPU: Error says .dll file not found, but it does exist

I've been trying to get CUDA to work with TensorFlow for a while now because the neural nets I've been building are now taking hours to train on my CPU, and it'd be great to get that big speed boost. However, whenever I try to use it with TensorFlow (it works with PyTorch, but I want to learn multiple APIs), it tells me that one of the .dll files needed to run CUDA doesn't exist, when it actually does.
I've downloaded and replaced that .dll with other versions from dll-files.com. I've tried uninstalling and reinstalling TensorFlow, CUDA, and cuDNN. I've tried different versions of CUDA, but that only caused all the .dll files to not be found (and yes, I did change the CUDA_PATH value). I've tried switching the PATH between C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0 and C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin to see if that changed anything.
If anyone could help with this, that would be much appreciated.
The errors I get when I run tf.test.is_gpu_available()
The file existing

Try installing a different older version of the CUDA toolkit on top of the version you have installed already. It fixed it for me however I also had to import all the previous dlls from the latest cuDNN toolkit into the new legacy CUDA toolkit installs as well.

Have you checked if your TF version was compatible with your CUDA version?
Check the compatibility matrix here: https://www.tensorflow.org/install/source#tested_build_configurations
Unless you compile TF from source, CUDA 11 is not supported yet.
In any case, I would avoid downloading dll from the website you mentioned.

How to use AMD GPU for fastai/pytorch?

I'm using a laptop which has Intel Corporation HD Graphics 5500 (rev 09), and AMD Radeon r5 m255 graphics card.
Does anyone know how to it set up for Deep Learning, specifically fastai/Pytorch?

Update 3:
Since late 2020, torch-mlir project has come a long way and now supports all major Operating systems. Using torch-mlir you can now use your AMD, NVIDIA or Intel GPUs with the latest version of Pytorch.
You can download the binaries for your OS from here.
Update 2:
Since October 21, 2021, You can use DirectML version of Pytorch.
DirectML is a high-performance, hardware-accelerated DirectX 12
based library that provides GPU acceleration for ML based tasks. It supports all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.
Update:
For latest version of PyTorch with DirectML see: torch-directml
you can install the latest version using pip:
pip install torch-directml
For detailed explanation on how to setup everything see Enable PyTorch with DirectML on Windows.
side note concerning pytorch-directml:
Microsoft has changed the way it released pytorch-directml. it deprecated the old 1.8 version and now the offers the new torch-directml(as apposed to the previously called pytorch-directml).
It is now installed as a plugin for the actual version of Pytorch and works align side it.
Old version:
The initial release of pytorch-directml (Oct 21, 2021):
Microsoft has release Pytorch_DML a few hours ago.
You can now install it (in windows or WSL) using pypi package:
pytorch-directml 1.8.0a0.dev211021
pip install pytorch-directml
So if you are on windows or using WSL, you can hop in and give this a try!
Update :
As of Pytorch 1.8 (March 04, 2021), AMD ROCm versions are made available from Pytorch's official website. You can now easily install them on Linux and Mac, the same way you used to install the CUDA/CPU versions.
Currently, the pip packages are being provided only. Also, the Mac and Windows platforms are still not supported (I haven't tested with WSL2 though!)
Old answer:
You need to install the ROCm version. The official AMD instructions on building Pytorch is here.
There was previously a wheel package for rocm, but it seems AMD doesn't distribute that anymore, and instead, you need to build PyTorch from the source as the guide which I linked to explains.
However, you may consult this page, to build the latest PyTorch version: The unofficial page of ROCm/PyTorch.

Update: In March 2021, Pytorch added support for AMD GPUs, you can just install it and configure it like every other CUDA based GPU. Here is the link
Don't know about PyTorch but, Even though Keras is now integrated with TF, you can use Keras on an AMD GPU using a library PlaidML link! made by Intel. It's pretty cool and easy to set up plus it's pretty handy to switch the Keras backends for different projects

How do I resolve "KeyError: 'brand'" when running an experiment using Azure Automated Machine Learning?

I am using the Azure Automated Machine Learning SDK to train a machine learning model on my dataset. However, after the experiment, all my training iterations fail with a KeyError: 'brand' error even if the model training itself succeeded.
How can I resolve this?

If a new environment was created after 10 June 2020 using SDK 1.7.0 or lower, training may fail with the above error due to an update in the py-cpuinfo package. (Environments created on or before 10 June 2020 are unaffected, as well as experiments run on remote compute as cached training images are used.) To work around this issue, either of the two following steps can be taken:
Update the SDK version to 1.8.0 or higher (this will also downgrade py-cpuinfo to 5.0.0):
pip install --upgrade azureml-sdk[automl]
Downgrade the installed version of py-cpuinfo to 5.0.0:
pip install py-cpuinfo==5.0.0

PyTorch C++ - how to know the recommended version of cuDNN?

I've previously inferenced TensorFlow graphs from C++. Now I'm embarking on working out how to inference PyTorch graphs via C++.
My first question is, how can I know the recommended version of cuDNN to use with LibTorch, or if I'm doing my own PyTorch compile?
Determining the recommended CUDA version is easy. Upon going to https://pytorch.org/ and choosing the options under Quick Start Locally (PyTorch Build, Your OS, etc.) the site makes it pretty clear that CUDA 10.1 is recommended, but there is no mention of cuDNN version and upon Googling I'm unable to find a definitive answer for this.
From what I understand about PyTorch on ubuntu, if you use the Python version you have to install the CUDA driver (ex. so nvidia-smi works, version 440 currently), but the CUDA and cuDNN install are not actually required beyond the driver because they are included in the pip3 package, is this correct? If so, then is there a command I can run in a Python script that shows the version of CUDA (expected to be 10.1) and cuDNN that the pip pre-compiled .whl uses? I suspect there is such a command but I'm not familiar enough with PyTorch yet to know what that may be or how to look it up.
I've ran into compile and inferencing errors using C++ with TensorFlow when I was not using the specific recommended version of cuDNN for a certain version of TensorFlow and CUDA so I'm aware these version can be sensitive and I have to make the right choices from the get-go. If anybody can assist in determining the recommended version of cuDNN for a certain version of PyTorch that would be great.

CUDA is supported via the graphics card driver, AFAIK there's no separate "CUDA driver". The system graphics card driver pretty much just needs to be new enough to support the CUDA/cudNN versions for the selected PyTorch version. To the best of my knowledge backwards compatibility is included in most drivers. For example a driver that supports CUDA 10.1 (reported via nvidia-smi) will also likely support CUDA 8, 9, 10.0
If you installed with pip or conda then a version of CUDA and cudNN are included with the install. You can query the actual versions being used in python with torch.version.cuda and torch.backends.cudnn.version().

Is there any implementation of mtcnn face detection in tensorflow 2.0?

Recently I've moved to tensorflow==2.0.0-rc0 and now mtcnn for face detection is not working on my computer. Can I find tensorflow==2.0.0-rc0 version of mtcnn? Pure Keras implementation of mtcnn would also work in this situation.
I've tried keras implementation of facenet at https://github.com/nyoki-mtl/keras-facenet. Its nice implementation of facenet in kears but the face detection part (mtcnn in keras) is missing.

I needed TF 2 as well so just pushed up this library. You should be able to clone the repo and python setup.py install to install it. It's developed against tensorflow 2.5.0.
P.S. For those looking to port TF 1.x libraries to TF 2 in the future, this comment may be all you need!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string