Difference between Keras 2.0.8 and 2.1.5? - keras

I am training a GAN and I see that my performance is very different on my CPU and GPU. I noticed that on the GPU installation, it is 2.0.8 and for CPU it is 2.1.5. On a separate machine with keras+tf GPU I get the same performance as the CPU one from before, the keras version is 2.1.6.
Is this expected? In the keras release notes I did not find anything that would change the way my training works.
The performance with the new one is better in many senses. Much faster convergence (10x less epochs required) but the images are less smooth sometimes.

Related

CUDA out of memory and BERT

Is there a way that I can manage how much memory to reserve for Pytorch? I've trying to finetune a BERT model from Huggingface and I'm already using a A100 Nvidia GPU with 80GiB of GPU from Paperspace and still managed to go "CUDA out of memory". I'm not training from scratch or anything and my dataset is composed by less than 250,000 datum. I don't know if I'm missing something or if there is a way to reduce the allocated memory for Pytorch.
(Seriously, I just wanna cry now.)
Error specs
I've tried reducing my batch size at training dataset and eval dataset, also turning fp16=Tue. I even tried with the gradient accumulation but nothing seems to work. I've been cleaning with gc.collect() and torch.cuda.empty_cache() and restarting my kernel machine. I even tried with a smaller model.

Pytorch model doesn't converge with single GPU but works well on two same GPUs

I met a strange problem. I trained my model with one GPU(RTX Titan), and it doesn't converge. However, it worked well on two same GPUs with the same settings. There is nothing to do with the batch size. And I use the torch.fft and torch.Transformer layer. I use Python 3.8, Pytorch 1.71 and Cuda 10.1.

What is difference between the result of using GPU or not?

I have a CNN with 2 hidden layers. When i use keras on cpu with 8GB RAM, sometimes i have "Memory Error" or sometimes precision class was 0 but some classes at the same time were 1.00. If i use keras on GPU,will it solve my problem?
You probably don't have enough memory to fit all the images in the CPU during training. Using a GPU will only help if it has more memory. If this is happening because you have too many images or they're resolution is too high, you can try using keras' ImageDataGenerator and any of the flow methods to feed your data in batches.

Object detection slow and does not use GPU

I need to use Tensorflow Object Detection API to make some classification connected with recognition.
My problem is that using the API for detection with a pretrained coco model takes too much time and for sure does not use the GPU. I checked my tensorflow-gpu installation on different scripts and it works fine, but when I use this model for detection I can only see increse in CPU usage.
I checked different version of tensorflow (1.12, 1.14), different combinations of CUDA Toolkit (9.0, 10.0) and CuDNN (7.4.2, 7.5.1, 7.6.1) but it is all the same, also tried it on both Windows 7 and Ubuntu 16.04, no difference. My project however requires much faster detection time.
System information:
System: Windows 7, Ubuntu 16.04
Tensorflow: 1.12, 1.14
GPU: GTX 970
Run following python code, if it detects GPU then you can use GPU for training otherwise there is some problem,
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
One more thing, just because your CPU is utilizing does not mean GPU is not at work. CPU always be busy, GPU should also spike when you are training.
Paste the output of above code in the comment if you are not sure about the output.
Edit: After chat with OP on comments, I see the suggested code, and it is using pretrained model, so no training happening here. You are using model and not training a new model. So no gpu is being used.

Poor performance using GPU with tensorflow?

I'm trying to train an LSTM network on a corpus of text(~7M), however it's taking extremely long per epoch even though it's on an Nvidia Tesla p100.
My model structure is 2 LSTM layers with 256 neurons each, interspersed by Dropout and a final fully connected layer. I am splitting it into 64 char chunk sentences.
Any reason for this insanely slow performance? It's almost 7.5 hours per epoch! Could it be due to the CPU computation warnings? I didn't think this would cause issues with GPU computations.

Resources