I'm trying to train an LSTM network on a corpus of text(~7M), however it's taking extremely long per epoch even though it's on an Nvidia Tesla p100.
My model structure is 2 LSTM layers with 256 neurons each, interspersed by Dropout and a final fully connected layer. I am splitting it into 64 char chunk sentences.
Any reason for this insanely slow performance? It's almost 7.5 hours per epoch! Could it be due to the CPU computation warnings? I didn't think this would cause issues with GPU computations.
Related
Is there a way that I can manage how much memory to reserve for Pytorch? I've trying to finetune a BERT model from Huggingface and I'm already using a A100 Nvidia GPU with 80GiB of GPU from Paperspace and still managed to go "CUDA out of memory". I'm not training from scratch or anything and my dataset is composed by less than 250,000 datum. I don't know if I'm missing something or if there is a way to reduce the allocated memory for Pytorch.
(Seriously, I just wanna cry now.)
Error specs
I've tried reducing my batch size at training dataset and eval dataset, also turning fp16=Tue. I even tried with the gradient accumulation but nothing seems to work. I've been cleaning with gc.collect() and torch.cuda.empty_cache() and restarting my kernel machine. I even tried with a smaller model.
I have adapted the base transformer model, for my corpus of aligned Arabic-English sentences. As such the model has trained for 40 epochs and accuracy (SparseCategoricalAccuracy) is improving by a factor of 0.0004 for each epoch.
To achieve good results, my estimate is to attain final accuracy anywhere around 0.5 and accuracy after 40 epochs is 0.0592.
I am running the model on the tesla 2 p80 GPU. Each epoch is taking ~2690 sec.
This implies I need at least 600 epochs and training time would be 15-18 days.
Should I continue with the training or is there something wrong in the procedure as the base transformer in the research paper was trained on an ENGLISH-FRENCH corpus?
Key highlights:
Byte-pair(encoding) of sentences
Maxlen_len =100
batch_size= 64
No pre-trained embeddings were used.
Do you mean Tesla K80 on aws p2.xlarge instance.
If that is the case, these gpus are very slow. You should use p3 instances on aws with V100 gpus. You will get around 6-7 times speedup.
Checkout this for more details.
Also, if you are not using the standard model and have made some changes to model or dataset, then try to tune the hyperparameters. Simplest is to try decreasing the learning rate and see if you get better results.
Also, first try to run the standard model with standard dataset to benchmark the time taken in that case and then make your changes and proceed. See when the model starts converging in the standard case. I feel that it should give some results after 40 epochs also.
I have a CNN with 2 hidden layers. When i use keras on cpu with 8GB RAM, sometimes i have "Memory Error" or sometimes precision class was 0 but some classes at the same time were 1.00. If i use keras on GPU,will it solve my problem?
You probably don't have enough memory to fit all the images in the CPU during training. Using a GPU will only help if it has more memory. If this is happening because you have too many images or they're resolution is too high, you can try using keras' ImageDataGenerator and any of the flow methods to feed your data in batches.
I am building a chatbot model in keras, and I am planning on using it on a raspberry pi. I have a huge database with the size of (1000000, 15, 100) which means there are 1 million sample with a maximum of 15 words and the embedding dimensions are 100d using GloVe. I build a simple model consists of 1 embedding layer, 1 bidirectional lstm layer, 1 droput layer and 2 dense layer with output shape of (25,).
I know that because of the huge database the training process is going to take long, but does the size of the database going to affect the speed of model.predict or does the speed only influenced by the structure of the model and the shape of the input?
No, the size of the dataset does not affect the prediction speed of the model per se, as you say prediction computation time is only affected by the architecture of the model, and the dimensionality of the inputs.
In general the problem with making small models that are fast in embedded hardware is that as small model (with less parameters) might not perform as well as a more complex model (in terms of accuracy or error), so you have to perform a trade-off between model complexity and computational performance.
I am training a GAN and I see that my performance is very different on my CPU and GPU. I noticed that on the GPU installation, it is 2.0.8 and for CPU it is 2.1.5. On a separate machine with keras+tf GPU I get the same performance as the CPU one from before, the keras version is 2.1.6.
Is this expected? In the keras release notes I did not find anything that would change the way my training works.
The performance with the new one is better in many senses. Much faster convergence (10x less epochs required) but the images are less smooth sometimes.