Setting up keras and tensoflow to operate with AMD GPU - python-3.x

I am trying to set up Keras in order to run models using my GPU. I have a Radeon RX580 and am running Windows 10.
I saw realized that CUDA only supports NVIDIA GPUs and was having difficulty finding a way to get my code to run on the GPU. I tried downloading and setting up plaidml but afterwards from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
only printed that I was running on a CPU and there was not a GPU available even though the plaidml setup was a success. I have read that PyOpenCl is needed but have not gotten a clear answer as to why or to what capacity. Does anyone know how to set up this AMD GPU to work properly? any help would be much appreciated. Thank you!

To the best of my knowledge, PlaidML was not working because I did not have the required prerequisites such as OpenCL. Once I downloaded the Visual Studio C++ build tools in order to install PyopenCL from a .whl file. This seemed to resolve the issue

Related

torch.cuda.is_available() returns false despite GPU settings and cuda installed on Colab

On Colab, my code constantly returns GPU not available. I thought that it might be a problem with the regular subscription, but even after getting Colab Pro, I still have this issue.
I checked the following:
Notebook settings have hardware accelerator set to GPU
nvidia-smi returns the following:
nvcc-version returns the following:
Does anyone know how to resolve this? Thanks!

Does torch.distributed support point-to-point communication for GPU?

I am looking into how to do point-to-point communication with multiple GPUs on separate nodes in PyTorch.
As of version 1.10.0, the documentation page for PyTorch says question marks for send and recv for GPU with the MPI backend. What does this mean? If anyone has successfully set up PyTorch so that torch.distributed allows point-to-point communication on multiple GPUs, please let me know and how you set it up. Specifically, which MPI are you using? What about the versions of pyTorch and Cuda?
I guess I'll post what I have learned so far.
Pytorch does seem to support p-to-p communication with MPI on GPU. However, this requires you to have a Cuda-aware MPI. (If your MPI isn't Cuda-aware, you'll need to build MPI from source with a specific parameter). In addition, if your Pytorch doesn't have MPI enabled, you need to compile Pytorch from source with MPI installed. This seems a very complicated route to go.
However, it seems the documentation I linked to is misleading. Looking at the release note, Pytorch supports send/recv in NCCL backend since 1.8.0... That being said, I have tried doing send/recv with NCCL but it throws errors saying NCCL are getting invalid arguments. I'm not sure if it's my problem or there are still bugs in pytorch distributed code.

Brain.js is not utilizing GPU for training

I have been using Brain.js to train a Neural network, and it has been working just fine, except it seems that it is only using CPU to train the neural net.
I am using Windows, and the Task Manager shows the Node process using ~25% CPU (I assume it is maxing out a single thread). Looking at MSI Afterburner, the GPU is not being utilized at all.
The GPU is an Nvidia RTX 2060 super.
What can I do to make Brain.js use my GPU? I have searched around but have not been able to find much info at all so far...

Different message prompt on google colab vs Pycharm

I am running the same CNN model on the same dataset (with 50000 training examples) with exactly same parameters on both Google Colab (I think it has K80 GPU) and my own system (with GTX 1080 GPU and 8700K CPU). I am using the batch_size=32 on both but I am surprised to see that while training, Google Colab shows me;
while my own system (using PyCharm), shows me
I can understand that the difference in accuracies is may be due to different random initializations but why on Google Colab it shows the training during each epoch in terms of number of batches 1563/1563 while on my machine, it shows me in terms of number of examples in the training set i.e. 50000/50000
In both cases I am using tf.keras.
Has it anything to do with the version. On my (Windows) machine, the tensorflow-gpu version is 2.1.0 whereas, on the Google Colab (probably Linux) machine, it is 2.2.0. I cannot upgrade the version on my windows machine from 2.1.0 to 2.2.0, probably they are same as can be seen here;
Please correct me if I am wrong.

Direction Installing OpenCL on Linux Mint Dell 9550

I'm about to enter the next phase of a project where I am moving computation to the GPU. Unfortunately, I have had very poor success setting up OpenCL in my environment. I hoped I could garner some specific direction about what implementation of OpenCL to use and how to avoid certain pitfalls upon installation.
My machine:
Linux Mint 17.3
Dell XPS 15 9550 with an Nvidia GTX 960M graphics chip
Some specifics:
I have been unable to find any graphics drivers that work with this hardware other than the Nvidia-352 version found in this PPA:
https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa
Every other one I try bricks the machine. I've reinstalled Mint more times than I can count finding this one driver. Keep in mind that I must use this configuration for my machine to work.
I attempted to install Nvidia's CUDA toolkit from their site (https://developer.nvidia.com/cuda-downloads) and for some reason the installation overwrote my Nvidia-352 driver and bricked the machine again.
At this point Im not certain which implementation is correct anyway. I do not want to try another and have the same thing happen.
Some specific questions:
Does every implementation of OpenCL assert itself over the currently installed drivers?
If it does then how can I direct my machine to use the correct one?
Which implementation would be right for my machine?
Can you think of any resources or links that I might be interested in to keep me moving forward? Specifically some installation instructions?
Thanks,
Chronic
Disclaimer: all this is based on my experience with Ubuntu 15.10, but hopefully Mint isn't too different.
Does every OpenCL installation overwrite the others?
If you're installing two different vendor's OpenCL implementations then no, they shouldn't overwrite each other. For example, I have Nvidia, Intel CPU, POCL and Beignet (Intel GPU) platforms installed and working. The only caveat is that the Intel CPU runtime overwrote libOpenCL.so* files, resulting in a crash in clinfo because it required libOpenCL.so.1 which the Intel CPU runtime decided to delete. Re-installing the package ocl-icd-opencl-dev fixed this and you can also make libOpenCL.so.1 a symlink to the actual .so file left by the Intel CPU runtime.
If you try installing two versions for the same platform, like you tried, then yes the last one you install will overwrite the previous one. In your case, remember that the CUDA toolkit also includes the GPU drivers. I haven't played with the CUDA toolkit in a while, perhaps there is an option to install the toolkit only and not the drivers, but since each toolkit requires a certain minimum driver version, you'd have to pick a toolkit version that works with the driver version you can get installed.
On Ubuntu, there is an nvidia-cuda-toolkit package you can sudo apt-get install. Id doesn't ask to change my drivers, hopefully it will work for you. I don't know what version of the toolkit this one installs.
Which implementation is right
If you only want to do OpenCL development then install the nvidia-352 package that worked for you, as well as installing ocl-icd-opencl-dev. This package installs the ocl-icd-libopencl and opencl-headers packages, giving the header files and libOpenCL.so (the ICD loader). You also need to sudo apt-get install nvidia-opencl-icd-352 as that provides the OpenCL runtime for Nvidia GPUs. If you also want to do CUDA development then you need the toolkit.
As a side note, install one of the CPU runtimes, e.g. POCL, in addition to the Nvidia runtime. I found this useful for detecting a bug in my kernel - the kernel worked most of the time on my Nvidia GPU but failed consistently on POCL. It was a race condition.
Useful links
Sorry, no up-to-date installation instructions. However, the instructions provided by each vendor with their OpenCL runtime (except Nvidia) seem to be good enough for me.
Here's some older instructions:
https://wiki.tiker.net/OpenCLHowTo
https://streamcomputing.eu/blog/2011-06-24/install-opencl-on-debianubuntu-orderly/ - The rest of the StreamComputing blog is also interesting.

Resources