Getting loss as "nan" while running CNN (using Keras) - keras

Need Suggestion
I am trying to design a model to guess Facial-Points. Its a part of Kaggle Competition (https://www.kaggle.com/c/facial-keypoints-detection).
In this solution, I am trying to design a CNN model (using Keras Library), as a Multi-variable regression model to Predict the co-ordinates of Facial-points.
Issue Faced --> I am getting loss as "nan"
Solutions tried --
1. Tried optimizers - Adam, SGD
2. tested with Learning rate 0.01 to 0.00001
3. Tried with various batch sizes
Can anyone suggest, if I am missing something. The code is present in below link -
https://www.kaggle.com/saurabhrathor/facialpoints-practice

Related

training accuracy highprediction of the training set is low

I am using keras 2.2.4 to train a text-based emotion recognition model, which is three categories classification.
I put accuracy in the model compile metric. While training, the accuracy showed in the result bar was normal, around 86%,and the loss is around 0.35, which i believed it was working properly.
After training, I found that the prediction with the testing set was pretty bad, acc was only around 0.44. with my instinct, it might be overfitting issue with the model. However, with my random curiosity, I put the training set into the model prediction, the accuracy was also pretty bad, same as the testing set.
The result showed that it might not be an overfitting issue with the model, and I cannot come up with any possible reason why this happen. Also, the difference between the accuracy output while training with the training set and the accuracy with the training set after training is even more confusing.
Does anyone ever encounter the same situation, and what problem it may be?

What type of optimization to perform on my multi-label text classification LSTM model with Keras?

I'm using Windows 10 machine. Libraries: Keras with Tensorflow 2.0 Embeddings: Glove(100 dimensions).
I am trying to implement an LSTM architecture for multi-label text classification.
I am using different types of fine-tuning to achieve better results but with no luck so far.
The main problem I believe is the difference in class distributions of my dataset but after a lot of tries and errors, I couldn't implement stratified-k-split in Keras.
I am also experimenting with dropout layers, batch sizes, # of layers, learning rates, clip values, validation splits but I get a minimum boost or worst performance sometimes.
For metrics, I use mainly ROC and F1.
I also followed the suggestion from a StackOverflow member who said to delete some of my examples so I can balance my dataset but if I do that I will have a very low number of examples.
What would you suggest to me?
If someone can provide code based on my implementation for
stratified-k-split I would be grateful cause I have checked all the
online resources but can't implement it.
Any tips, suggestions will be really helpful.
Metrics Plots
Dataset form+Embedings form+train-test-split form
Dataset's labels distribution
My LSTM implementation

Sentiment analysis using images

I am trying sentiment analysis of images.
I have 4 classes - Hilarious , funny very funny not funny.
I tried pre trained models like VGG16/19 densenet201 but my model is overfitting getting training accuracy more than 95% and testing around 30
Can someone give suggestions what else I can try?
Training images - 6K
You can try the following to reduce overfitting:
Implement Early Stopping: compute the validation loss at each epoch and a patience threshold for stopping
Implement Cross Validation: refer to Section Cross-validation in
https://cs231n.github.io/classification/#val
Use Batch Normalisation: normalises the activations of layers to
unit variance and zero mean, improves model generalisation
Use Dropout (either or with batch norm): randomly zeros some activations to incentivise use of all neurons
Also, if your dataset isn't too challenging, make sure you don't use overly complex models and overkill the task.

Calculating loss in YOLOv3 for all three scales

I'm currently trying to implement YOLOv3 in TensorFlow, using the Estimator API. However, I'm stuck at the loss function. YOLOv3 makes predictions at three scales and I can't figure out, how to calculate the loss for all of them. I've already looked at the paper and also tried to find the loss function in the darknet source code but can't figure it out. I've also looked at the code for the loss function of another implementation YOLOv3 TensorFlow implementation but this hasn't really helped me to understand the calculation of the loss either.
Can someone explain how exactly the loss for training is calculated while taking into account the predictions of all three scales?

How do I use a trained Theano artificial neural network on single examples?

I have been following the http://deeplearning.net/tutorial/ tutorial on how to train an ANN to classify the MNIST numbers. I am now at the "Convolutional Neural Networks" chapter. I want to use the trained network on single examples (MNIST images) and get the predictions. Is there a way to do that?
I have looked ahead in the tutorial and on google but can't find anything.
Thanks a lot in advance for any kind of help!
The material in the Theano tutorial in the earlier chapters, before reaching the Convolutional Neural Networks (CNN) chapter, give a good overview of how Theano works and some of the components the CNN sample code uses. It might be reasonable to assume that students reaching this point have developed their understanding of Theano sufficiently to figure out how to modify the code to extract the model's predictions. Here's a few hints.
The CNN's output layer, called layer3, is an instance of the LogisticRegression class, introduced in an earlier chapter.
The LogisticRegression class has an attribute called y_pred. The comments next to the code which assigns that attribute's values says
symbolic description of how to compute prediction as class whose
probability is maximal
Looking for places where y_pred is used in the logistic regression sample will highlight a function called predict(). This does for the logistic regression sample what is desired of the CNN example.
If one follows the same approach, using layer3.y_pred as the output of a new Theano function, the model's predictions will become apparent.

Resources