Why my validation loss is always in a certain range of values? - conv-neural-network

I have been working with an convolutional autoencoder model. My encoder works fine but I am facing problem with my decoder. I have tried two different model.
For Model 1, training loss starts with 0.0436 and then after some epochs, the value was in range betweek 0.0280 and 0.0320. For validation, loss starts with 0.0306 and then after some epochs it the loss was in between 0.0275 and 0.0285.
For Model 2, the training loss decreased nicely but for the validation loss, it started with 0.2702 but after some epochs, the value was in range between 0.1450 and 0.1550.
I have used 15000 images from MS Coco, 'mse' as a loss function, 'Adam' as an optimizer with 0.00001 learning rate.
I have tried to use dropout layer, regularization but nothing is working. First I have tried it with 3000 images but later I increased the dataset into 15000 but still getting same problem. Total number of parameter of my model is 221,449,088.

Related

why is the validation accuracy of CNN model behaves like this?

Why my validation accuracy remains constant and then its start increasing gradually with number of epochs

Is Imagenet_1k dataset enough to train Resnet50 model?

I used nervana distiller to train resnet50 baseline model with imagenet_1k dataset.I've observed after 100 epochs, Top5 accuracy is about 10%.The validation accuracy remains zero for long step.

How Keras really fits the models via epochs

I am a bit confused on how Keras fits the models. In general, Keras models are fitted by simply using model.fit(...) something like the following:
model.fit(X_train, y_train, epochs=300, batch_size=64, validation_data=(X_test, y_test))
My question is: Because I stated the testing data by the argument validation_data=(X_test, y_test), does it mean that each epoch is independent? In other words, I understand that at each epoch, Keras train the model using the training data (after getting shuffled) followed by testing the trained model using the provided validation_data. If that's the case, then no matter how many epochs I choose, I only take the results of the last epoch!!
If this scenario is correct, so we do we need multiple epoches? Unless these epoches are dependent somwhow where each epoch uses the same NN weights from the previous epoch, correct?
Thank you
When Keras fit your model it pass throught all the dataset at each epoch by a step corresponding to your batch_size.
For exemple if you have a dataset of 1000 items and a batch_size of 8, the weight of your model will be updated by using 8 items and this until it have seen all your data set.
At the end of that epoch, the model will try to do a prediction on your validation set.
If we have made only one epoch, it would mean that the weight of the model is updated only once per element (because it only "saw" one time the complete dataset).
But in order to minimize the loss function and by backpropagation, we need to update those weights multiple times in order to reach the optimum loss, so pass throught all the dataset multiple times, in other word, multiple epochs.
I hope i'm clear, ask if you need more informations.

Multivariate LSTM Forecast Loss and evaluation

I have a CNN-RNN model architecture with Bidirectional LSTMS for time series regression problem. My loss does not converge over 50 epochs. Each epoch has 20k samples. The loss keeps bouncing between 0.001 - 0.01.
batch_size=1
epochs = 50
model.compile(loss='mean_squared_error', optimizer='adam')
trainingHistory=model.fit(trainX,trainY,epochs=epochs,batch_size=batch_size,shuffle=False)
I tried to train the model with incorrectly paired X and Y data for which the
loss stays around 0.5, is it reasonable conclusion that my X and Y
have a non linear relationship which can be learned by my model over
more epochs ?
The predictions of my model capture the pattern but with an offset, I use dynamic time warping distance to manually check the accuracy of predictions, is there a better way ?
Model :
model = Sequential()
model.add(LSTM(units=128, dropout=0.05, recurrent_dropout=0.35, return_sequences=True, batch_input_shape=(batch_size,featureSteps,input_dim)))
model.add(LSTM(units=32, dropout=0.05, recurrent_dropout=0.35, return_sequences=False))
model.add(Dense(units=2, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
If you tested with:
Wrong data: loss ~0.5
Correct data: loss ~0.01
Then your model is actually cabable of learning something.
There are some possibilities there:
Your output data does not fit in the range of the last layer's activation
Your model reached a limit for the current learning rate (gradient update steps are too big and can't improve the model anymore).
Your model is not good enough for the task.
Your data has some degree of random factors
Case 1:
Make sure your Y is within the range of your last activation function.
For a tanh (the LSTM's default), all Y data should be between -1 and + 1
For a sigmoid, between 0 and 1
For a softmax, between 0 and 1, but make sure your last dimension is not 1, otherwise all results will be 1, always.
For a relu, between 0 and infinity
For linear, any value
Convergence goes better if you have a limited activation instead of one that goes to infinity.
In the first case, you can recompile (after training) the model with a lower learning rate, usually we divide it by 10, where the default is 0.0001:
Case 2:
If data is ok, try decreasing the learning rate after your model stagnates.
The default learning rate for adam is 0.0001, we often divide it by 10:
from keras.optimizers import Adam
#after training enough with the default value:
model.compile(loss='mse', optimizer=Adam(lr=0.00001)
trainingHistory2 = model.fit(.........)
#you can even do this again if you notice that the loss decreased and stopped again:
model.compile(loss='mse',optimizer=Adam(lr=0.000001)
If the problem was the learning rate, this will make your model learn more than it already did (there might be some difficult at the beginning until the optimizer adjusts itself).
Case 3:
If you got no success, maybe it's time to increase the model's capability.
Maybe add more units to the layers, add more layers or even change the model.
Case 4:
There's probably nothing you can do about this...
But if you increased the model like in case 3, be careful with overfitting (keep some test data to compare the test loss versus the training loss).
Too good models can simply memorize your data instead of learning important insights about it.

Weird Training Issue with keras - sudden huge drop in loss with zeros in FC layer

I'm getting this odd issue with training a siamese-style CNN with Keras (backend of Tensor Flow, Ubuntu 14.04, Cuda 8, with cudnn). In short, the CNN has a shared set of weights that takes in two images, merges their respective FC layers, and then estimates a regression. I'm using MSE loss with the Adam optimizer (with default parameters). I've done this several times with different types of problems and have never seen the following.
Essentially what happens is on the first epoch, everything seems to be training fine, and the loss is decreasing slowly, as expected (ends at around an MSE of ~3.3 using a batch size of 32). The regression is estimating a 9-dimensional continuous-valued vector.
Then, as soon as the second epoch starts, the loss drops DRAMATICALLY (to ~ 4e-07). You'd think "oh yay the loss is really small--I win", but when I inspect the trained weights by prediction on novel inputs (I'm using the checkpointer to dump out the best set of weights according to the loss), I get odd behavior. No matter what the inputs are (different images, random noise as inputs, even zeros), I always get the same exact output. Further inspection shows that the last FC layer in the shared weights are all zeros.
If I look at the weights after the first epoch, when everything seems "normal", this doesn't happen--I just don't get optimal results (makes sense--only one epoch has occurred). This only happens with the second epoch and on.
Has anybody ever seen this? Any ideas? You think it's a dumb error on my part, or some weird bug?
More details on my network topology here. Here are the shared weights:
shared_model = Sequential()
shared_model.add(Convolution2D(nb_filter=96, nb_row=9, nb_col=9, activation='relu', subsample=(2,2), input_shape=(3,height,width)))
shared_model.add(MaxPooling2D(pool_size=(2,2)))
shared_model.add(Convolution2D(nb_filter=256, nb_row=3, nb_col=3, activation='relu', subsample=(2,2)))
shared_model.add(MaxPooling2D(pool_size=(2,2)))
shared_model.add(Convolution2D(nb_filter=256, nb_row=3, nb_col=3, activation='relu'))
shared_model.add(MaxPooling2D(pool_size=(2,2)))
shared_model.add(Convolution2D(nb_filter=512, nb_row=3, nb_col=3, activation='relu', subsample=(1,1)))
shared_model.add(Flatten())
shared_model.add(Dense(2048, activation='relu'))
shared_model.add(Dropout(0.5))
Then I merge them for regression as follows:
input_1 = Input(shape=(3,height,width))
input_2 = Input(shape=(3,height,width))
encoded_1 = shared_model(input_1)
encoded_2 = shared_model(input_2)
encoded_merged = merge([encoded_1, encoded_2], mode='concat', concat_axis=-1)
fc_H = Dense(9, activation='linear')
h_loss = fc_H(encoded_merged)
model = Model(input=[input_1, input_2], output=h_loss)
Finally, each epoch trains on about 1,000,000 samples, so there should be plenty of data to train. I've just never seen a FC layer get set to all zeros. And even at that, I don't understand how that makes for a very low loss when the training data are not all zeros.
For the zeroes which are seemingly getting predicted by the last layer what might have happened is the dying ReLU problem. Try LeakyReLU, tweak alpha. This worked for me in eradicating those zeros which I would get in the first epoch itself.

Resources