Object detection slow and does not use GPU - python-3.x

I need to use Tensorflow Object Detection API to make some classification connected with recognition.
My problem is that using the API for detection with a pretrained coco model takes too much time and for sure does not use the GPU. I checked my tensorflow-gpu installation on different scripts and it works fine, but when I use this model for detection I can only see increse in CPU usage.
I checked different version of tensorflow (1.12, 1.14), different combinations of CUDA Toolkit (9.0, 10.0) and CuDNN (7.4.2, 7.5.1, 7.6.1) but it is all the same, also tried it on both Windows 7 and Ubuntu 16.04, no difference. My project however requires much faster detection time.
System information:
System: Windows 7, Ubuntu 16.04
Tensorflow: 1.12, 1.14
GPU: GTX 970

Run following python code, if it detects GPU then you can use GPU for training otherwise there is some problem,
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
One more thing, just because your CPU is utilizing does not mean GPU is not at work. CPU always be busy, GPU should also spike when you are training.
Paste the output of above code in the comment if you are not sure about the output.
Edit: After chat with OP on comments, I see the suggested code, and it is using pretrained model, so no training happening here. You are using model and not training a new model. So no gpu is being used.

Related

Porting Torch GPU code with Numba Cuda kernels to work on Apple silicone

I have a simulation written in Python which utilizes the GPU mainly through PyTorch operations, but in a couple of places I had to write a couple of (relatively simple) custom kernels via Numba's cuda library (using as_cuda_array() on the torch tensors to get a DeviceNDArray handle).
Iv'e now moved to an Apple machine with a M1 processor. It seems that the torch code can be easily edited to run on the Apple GPU, but Numba has no such option.
What would be the easiest option to rewrite the code to work on Apple silicone?

Pytorch model doesn't converge with single GPU but works well on two same GPUs

I met a strange problem. I trained my model with one GPU(RTX Titan), and it doesn't converge. However, it worked well on two same GPUs with the same settings. There is nothing to do with the batch size. And I use the torch.fft and torch.Transformer layer. I use Python 3.8, Pytorch 1.71 and Cuda 10.1.

Google Colab + Pytorch: RuntimeError: No CUDA GPUs are available

Screenshot of error:
Hello, I am trying to run this Pytorch application, which is a CNN for classifying dog and cat pics.
I am using Google Colab for the GPU, but for some reason, I get RuntimeError: No CUDA GPUs are available. This is weird because I specifically both enabled the GPU in Colab settings, then tested if it was available with torch.cuda.is_available(), which returned true.
The weirdest thing is that this error doesn't appear until about 1.5 minutes after I run the code. You would think that if it couldn't detect the GPU, it would notify me sooner.
I've had no problems using the Colab GPU when running other Pytorch applications using the exact same notebook. I can only imagine it's a problem with this specific code, but the returned error is so bizarre that I had to ask on StackOverflow to make sure.
Try again, this is usually a transient issue when there are no Cuda GPUs available
Recently I had a similar problem, where Cobal print(torch.cuda.is_available()) was True, but print(torch.cuda.is_available()) was False on a specific project. Both of our projects have this code similar to os.environ["CUDA_VISIBLE_DEVICES"]. I can use this code comment and find that the GPU can be used.
-------My English is poor, I use Google Translate

Difference between Keras 2.0.8 and 2.1.5?

I am training a GAN and I see that my performance is very different on my CPU and GPU. I noticed that on the GPU installation, it is 2.0.8 and for CPU it is 2.1.5. On a separate machine with keras+tf GPU I get the same performance as the CPU one from before, the keras version is 2.1.6.
Is this expected? In the keras release notes I did not find anything that would change the way my training works.
The performance with the new one is better in many senses. Much faster convergence (10x less epochs required) but the images are less smooth sometimes.

Training one model with several GPU's

How you can program keras or tensorflow to partitionate training on multiple GPU, let's say you are in an amaozn ec2 instance that has 8 GPU's and you want to use all of them to train faster, but your code is just for a single cpu or GPU ?
Yes, can run Keras models on multiple GPUs. This is only possible with the TensorFlow backend for the time being, because the Theano feature is still rather new. We are looking at adding support for multi-gpu in Theano in the near future (it should be fairly straightforward).
With the TensorFlow backend, you can achieve this the same way as you would in pure TensorFlow: by using the with tf.device(d) scope when defining Keras layers.
Originally from here

Resources