I am developing an encoder-decoder model in order to predict titles for lecture transcripts. but the model is predicting the same title no matter what the input is. Any idea what may have caused such a problem?
If you would like to solve this, I will strongly recommend you to provide your code as an example, better including your loss, accuracy or something people will be more familiar about your problem. However, here are some conditions that will run into that problem: 1) your code was not doing the things you would like to do somehow. 2) LSTM sometimes experience gradient explode or gradient vanish problem, although it was said to fix those problem that a RNN structure will face, it still get into that problem form time to time anyway. 3) forget to shuffle your dataset before training, which makes your model learn the same pattern of one kind all the time. If all the things that mentioned above did not fit in your case, try to provide your code and dataset information to make it clear.
Related
Intro
I am making a classifier to recognize presence of defects in pictures, and in the path of improving my models, I tried Batch Normalization, mainly to exploit its ability to fasten convergence.
While it gives the expected speed benefits, I also observed some strange symptoms:
validation metrics are far from good. It smells of overfitting of course
predictions calculated at any point during training are completely wrong, particularly when images are picked from the training dataset; the corresponding metrics match with the (val_loss, val_acc) rather than with (loss, acc) printed during training
This failing to predict is the evidence that worries me the most. A model which does not predict the same as in training, is useless!
Searches
Googling around I found some posts that seem to be related, particularly this one (Keras BN layer is broken) which also claims the existence of a patch and of a pull request, that sadly "was rejected".
This is quite convincing, in that it explains a failure mechanism that matches my observations. As far as I understand, since BN calculates and keeps moving statistics (exponential averages and standard deviations) for doing its job, which require many iterations to stabilize and become significant, of course it will behave bad when it comes to make a prediction from scratch, when those statistics are not mature enough (in case I have misunderstood this concept, please tell me).
Actual Questions
But thinking more thoroughly, this doesn't really close the issue, and actually raises further doubts. I am still perplexed that:
This Keras BN being broken, is said to affect the use case of transfer learning, while mine is a classical case of a convolutional classifier, trained starting form standard glorot initialization. This should have been complained about by thousands of users, while instead there isn't much discussion about)
technically: if my understanding is correct, why aren't these statistics (since they are so fundamental for prediction) saved in the model, so that their latest update is available to make a prediction? It seems perfectly feasible to keep and use them at prediction time, as for any trainable parameter
managementwise: if Keras' BN were really broken, how could such a deadful bug remain unaddressed for more than one year? Isn't really out there anybody using BN and needing predictions out of their models? And not even anybody able to fix it?
more practically: on the contrary, if it is not a bug, but just a bad understanding on how to use it, were do I get a clear illustration of "how to correctly get a prediction in Keras for a model which uses BN?" (demo code would be appreciated)
Obviously I would really love that the right questions is the last, but I had to include the previous ones, given the evidence of someone claiming that Keras BN is broken.
Note to SE OP: before *closing the question as too broad*, please consider that, being not really clear what the issue is (Keras BN being broken, or the users being unable to use it properly), I had to offer more directions, among which whoever wishing to answer can choose.
Details
I am using keras 2.2.4 from a python 3.6 virtual environment (under pyenv/virtualenv).
data are fed through a classic ImageDataGenerator() + flow_from_directory() / flow_from_dataframe() scheme (augmentation is turned off though: only rescale=1./255 is applied), but I also tried to make them static
actually in the end, for verifying the above behaviour, I generated only one dataset x,y=next(valid_generator) and used an unique batch scheme for both training and validation. While on the training side it converges (yes, the aim was exactly to let it overfit!), on the validation side both metrics are poor and predictions are completely wrong and erratic (almost random)
in this setup, if BN is turned off, val_loss and val_acc match exactly with loss and acc, and with those that I can obtain from predictions calulated after training has finished.
Update
In the process of writing a minimal example of the issue, after battling to put in evidence the problem, I recognized that the problem is showing/not showing up in different machines. In particular, the problem is evident on a host running Keras 2.3.1, while another host with Keras 2.2.4 doesn't show it.
I'll post a minimal example here along with specific module versions asap.
I have a situation where:
My training accuracy is 93%
CV accuracy is 55%
Test accuracy is 57%
I think this is a classical case of overfitting.
As per my knowledge, I can use regularization.
I have read cross validation will also helps in solving my over fitting problem.
Some inquiries I have regarding this:
Whether cross validation is used only for hyperparameter tuning, or will it have a role in solving over fitting problem?
If cross validation solves overfitting problems, how?
Whether cross validation is used only as a check to see whether the model is over fitting or not?
I think you are confused on what exactly cross validation is. I will link to OpenML's explanation for 10-fold cross validation so you get a better idea.
Over-fitting occurs normally when there is not enough data for your model to train on, resulting in it learning patterns/similarities between the data set that is not helpful, such as putting too much focus on outlying data that would be ignored if given a larger data set.
Now to your questions:
1-2. Cross-validation is just one solution that is helpful for preventing/solving over-fitting. Through partitioning the data set into k-sub groups, or folds, you then can train your model on k-1 folds. The last fold will be used as your unseen validation data to test your model upon. This will sometimes help prevent over-fitting. A factor in this working though depends on how long/how many epochs you are training your data for. Since you said you have a relatively small data set, you want to make sure you aren't 'over-learning' on this data. Implementing cross-validation will not do you much good if you are training for hundreds/thousands of epochs on a really small data set.
Cross-validation doesn't tell you if your data is being over-fitted. It may give you hints that it is if your results are vastly different after several times running the program, but it is not going to be clear cut.
The biggest problem, and you said it yourself in the comments, is you don't have a lot of data. The best, although not always the easiest way, is to increase your data size so your model won't learn unimportant tendencies and put too much focus on the outliers.
I will link to a website that is incredibly helpful in explaining the problems of over-fitting and gives a variety of ways to attempt to overcome this problem.
Let me know if I was of help!
I am new to neural networks so I tried my first neural network which is pretty close to one at keras learn page,given below:
https://github.com/aakarsh1011/Neural-Network/blob/master/MNSIT%20classification.ipynb
Kindlly look at the ending where I red a random image and tried to predict it which comes out as a bag, and when trained at epocs=5 it predicted it as a sandal.
Is something wrong with my code or labeling.
UPDATE - Being new to the field I didn't know the importance of epochs so I asked this question, I was afraid that I don't over-fit the model or train train too much. But there is no definite way to do this, it's all try and error. GOOD LUCK!
First of all, as far as I can see, your code is correct. Your model predicting the wrong item can be caused by the model not being trained for long enough. I would highly recommend you to set epochs=100 and you will be able to see the model's accuracy rise. You should generally always try to give your model as many epochs as possible for training. It will simply take some time. Try out some different numbers of epochs to find the one not taking too long, but still giving an acceptable result.
I'm doing a programming assignment for Andrew Ng's Deep Learning course on Convolutional Models that involves training and evaluating a model using Keras. What I've observed after a little playing with various knobs is something curious: The test accuracy of the model greatly improves (from 50 percentile to 90 percentile) by setting the validation_fraction parameter on the Model.fit operation to 0. This is surprising to me; I would have thought that eliminating the validation samples would lead to over-fitting of the model, which would, in turn, reduce accuracy on the test set.
Can someone please explain why this is happening?
You're right, there is more training data, but the increase is pretty negligible since dI was setting the validation fraction to 0.1, so that would increase the training data by 11.111...% However, thinking about it some more, I realized that removing the validation step doesn't have any effect on the model, hence no impact on test accuracy. I think that I must have changed some other parameter, too, though I don't remember which.
As Matias says, it means there is more training data to work with.
However, I'd also make sure that the test accuracy is actually increasing from 50 to 90% consistently. Run it over a couple times to make sure. There is a possibility that, because there is very little validation samples, that the model got lucky. That's why it is important to have a lot of validation data - to make sure the model isn't just getting lucky, and that there's actually a method to the madness.
I go over some of the "norms" when it comes to training and testing data in my book about stock prediction (another great way in my opinion to learn about Deep Learning). Feel free to check it out and learn more, as it's great for beginners.
Good Luck!
I am a newby to the convolutional neural nets... so this may be an ignorant question.
I have followed many examples and tutorials now on the MNIST example in TensforFlow. In the CNN examples, all authors talk bout using the 'input filters' to run in the CNN. But no one that I can find mentions WHERE they come from. Can anyone answer where these come from? Or are they magically obtained from the input images.
Thanks! Chris
This is an image that one professor uses, be he does not exaplain if he made them or TensorFlow auto-extracts these somehow.
Disclaimer: I am not an expert, more of an enthusiast.
To cut a long story short: filters are the CNN equivalent of weights, and all a neural network essentially does is learning their optimal values.
Which it does by iterating through a training dataset, making predictions, comparing them to the label/value already assigned to each training unit (usually an image in case of a CNN) and adjusting weights to minimize the error function (the difference between the predicted value and the actual value).
Initial values of filters/weights do not matter that much, so although they might affect the speed of convergence to a small degree, I believe they are often assigned random values.
It is the job of the neural network to figure out the optimal weights, not of the person implementing it.