How to debug RNNs? - keras

I have set up a ResNet50 network for an optical application. With two input images, the network gives an estimate of 65 values (regression) and it works pretty well. However, the two input images belong to a time series, and the images of the time series will be somewhat correlated over a span of 10-15 times, so I expect that an additional RNN could improve estimates. I have tried to set up the network shown in the figure, using mostly frozen ResNet50 parameter values found by separate training and “TimeDistributed” ResNet50s. However the RNN training does not give useful accuracy.
Full LSTM network
I have now spent 2-3 weeks trying to debug my code (in particular the generator) but I have not found any coding errors. In frustration, I tried to set up the simplest RNN I could think of: A complete Resnet50 with either one or two SimpleRNNs with linear activation. However they do not provide even nearly the same accuracy as the ResNet50 alone in spite of the correlated time series.
SimpleRNN network
So my question is: Is it correct to assume that a single SimpleRNN with linear activation should provide the same accuracy as the ResNet50 alone?

This is a bit speculative, but it might suggest an approach to debug the RNN and answer your question. Here is an extremely simple network with a SimpleRNN and a test input of 2 samples, each with a single time step and single feature: i.e. shape=(2,1,1)
from keras.models import Sequential
from keras.layers import SimpleRNN
import numpy as np
x_train=np.array([[[0.1]],
[[0.2]]])
y_train=np.array([[1],[0]])
print(x_train.shape)
print(x_train)
print(y_train.shape)
print(y_train)
#simple network
model = Sequential()
model.add(SimpleRNN(1,activation=None, use_bias=False, input_shape=(1,1)))
model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
model.fit(x_train, y_train, epochs=10, batch_size=2)
wgt=model.get_weights()
print(wgt)
print('model.predict(x_train)')
print(model.predict(x_train))
Based on running the above, two weights come out of the RNN network. The first seems to be a simple scaling of the input and the second I'm suspecting is the weight of the recurrent loop which is not actually used for a single time step as in this example. The activation is linear so the result then matches the model.predict.
You may be able to extend approach this to reason about the performance with the Resnet and potentially answer your question. I hope this helps.

Related

Maximize accuracy of multilabel image classification

I am building a multilabel image classification network. The dataset contains 70k images, total number of classes are 12. With respect to the entire dataset, 12 classes has more than 10% images. Out of 12 classes, 3 classes are above 70%. I am using VGG16 network without its associated classifier.
As the training results, I am getting max of 68% validation accuracy. I have tried changing the number of units per Dense layer (512,256,128 etc), increased the number of layers (5, 6 layers), added/removed Dropout layer (with 0.5), kernel_regularization (L1=0.1, L2=0.1).
As accuracy is not the appropriate metric for multilabel classification, I am trying to incorporate HammingLoss as the metric. But it is not working, here is the issue that I opened on the GitHub repo of HammingLoss.
What can be done to improve the accuracy?
What point I am missing in case of incorporating HammingLoss?
For classification, I am using the network as:
network.add(vggBase)
network.add(tf.keras.layers.Dense(256, activation='relu'))
network.add(tf.keras.layers.Dense(64, activation='relu'))
network.add(tf.keras.layers.Dense(12, activation='sigmoid'))
network.compile(optimizer=tf keras.optimizers.Adam(learning_rate=0.001), loss=tf.keras.losses.BinaryCrossentropy(), metrics=['accuracy'])
I recommend you to use Keras Tuner for tuning.
If Hammingloss is not working for you, you could use a differnet metric as a workaround, like pr_auc for instance. The metric choice depends strongly on what you want to achieve with your model. Maybe towardsdatascience/evaluating-multi-label-classifiers can help you to find that out.

LSTM Autoencoder for Anomaly detection in time series, correct way to fit model

I'm trying to find correct examples of using LSTM Autoencoder for defining anomalies in time series data in internet and see a lot of examples, where LSTM Autoencoder model are fitted with labels, which are future time steps for feature sequences (as for usual time series forecasting with LSTM), but I suppose, that this kind of model should be trained with labels which are the same sequence as sequence of features (previous time steps).
The first link in the google by this searching for example - https://towardsdatascience.com/time-series-of-price-anomaly-detection-with-lstm-11a12ba4f6d9
1.This function defines the way to get labels (y feature)
def create_sequences(X, **y**, time_steps=TIME_STEPS):
Xs, ys = [], []
for i in range(len(X)-time_steps):
Xs.append(X.iloc[i:(i+time_steps)].values)
ys.append(y.iloc**[i+time_steps]**)
return np.array(Xs), np.array(ys)
X_train, **y_train** = create_sequences(train[['Close']], train['Close'])
X_test, y_test = create_sequences(test[['Close']], test['Close'])
2.Model is fitted as follow
history = model.fit(X_train, **y_train**, epochs=100, batch_size=32, validation_split=0.1,
callbacks=[keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, mode='min')], shuffle=False)
Could you kindly comment the way how Autoencoder is implemented in the link on towardsdatascience.com/?
Is it correct method or model should be fitted following way ?
model.fit(X_train,X_train)
Thanks in advance!
This is time series auto-encoder. If you want to predict for future, it goes this way. The auto-encoder / machine learning model fitting is different for different problems and their solutions. You cannot train and fit one model / workflow for all problems. Time-series / time lapse can be what we already collected data for time period and predict, it can be for data collected and future prediction. Both are differently constructed. Like time series data for sub surface earth is differently modeled, and for weather forecast is differently. One model cannot work for both.
By definition an autoencoder is any model attempting at reproducing it's input, independent of the type of architecture (LSTM, CNN,...).
Framed this way it is a unspervised task so the training would be : model.fit(X_train,X_train)
Now, what she does in the article you linked, is to use a common architecture for LSTM autoencoder but applied to timeseries forecasting:
model.add(LSTM(128, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(RepeatVector(X_train.shape[1]))
model.add(LSTM(128, return_sequences=True))
model.add(TimeDistributed(Dense(X_train.shape[2])))
She's pre-processing the data in a way to get X_train = [x(t-seq)....x(t)] and y_train = x(t+1)
for i in range(len(X)-time_steps):
Xs.append(X.iloc[i:(i+time_steps)].values)
ys.append(y.iloc[i+time_steps])
So the model does not per-se reproduce the input it's fed, but it doesn't mean it's not a valid implementation since it produce valuable prediction.

How Keras really fits the models via epochs

I am a bit confused on how Keras fits the models. In general, Keras models are fitted by simply using model.fit(...) something like the following:
model.fit(X_train, y_train, epochs=300, batch_size=64, validation_data=(X_test, y_test))
My question is: Because I stated the testing data by the argument validation_data=(X_test, y_test), does it mean that each epoch is independent? In other words, I understand that at each epoch, Keras train the model using the training data (after getting shuffled) followed by testing the trained model using the provided validation_data. If that's the case, then no matter how many epochs I choose, I only take the results of the last epoch!!
If this scenario is correct, so we do we need multiple epoches? Unless these epoches are dependent somwhow where each epoch uses the same NN weights from the previous epoch, correct?
Thank you
When Keras fit your model it pass throught all the dataset at each epoch by a step corresponding to your batch_size.
For exemple if you have a dataset of 1000 items and a batch_size of 8, the weight of your model will be updated by using 8 items and this until it have seen all your data set.
At the end of that epoch, the model will try to do a prediction on your validation set.
If we have made only one epoch, it would mean that the weight of the model is updated only once per element (because it only "saw" one time the complete dataset).
But in order to minimize the loss function and by backpropagation, we need to update those weights multiple times in order to reach the optimum loss, so pass throught all the dataset multiple times, in other word, multiple epochs.
I hope i'm clear, ask if you need more informations.

How to adopt multiple different loss functions in each steps of LSTM in Keras

I have a set of sentences and their scores, I would like to train a marking system that could predict the score for a given sentence, such one example is like this:
(X =Tomorrow is a good day, Y = 0.9)
I would like to use LSTM to build such a marking system, and also consider the sequential relationship between each word in the sentence, so the training example shown above is transformed as following:
(x1=Tomorrow, y1=is) (x2=is, y2=a) (x3=a, y3=good) (x4=day, y4=0.9)
When training this LSTM, I would like the first three time steps using a softmax classifier, and the final step using a MSE. It is obvious that the loss function used in this LSTM is composed of two different loss functions. In this case, it seems the Keras does not provide the way to address my problem directly. In addition, I am not sure whether my method to build the marking system is correct or not.
Keras support multiple loss functions as well:
model = Model(inputs=inputs,
outputs=[lang_model, sent_model])
model.compile(optimizer='sgd',
loss=['categorical_crossentropy', 'mse'],
metrics=['accuracy'], loss_weights=[1., 1.])
Based on your explanation, I think you need a model that first, predict a token based on previous tokens, in NLP domain it usually called Language model, and then compute a score which I assume it is a sentiment (it is applicable to other domain).
To do so, you can train your language model with LSTM and pick the last output of LSTM for your ranking task. To this end, you need to define two loss function: categorical_crossentropy for the language model and MSE for the ranking task.
This tutorial would be helpful: https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/

Accuracy on middle layer of autoencoder implemente using Keras

I have implemented an autoencoder using Keras. I understand that I can add accuracy performance metric as follows:
autoencoder.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['accuracy'])
My question is:
Is the accuracy metric applied on the last layer of the decoder by default? If so, how can I set it so that it would get the representations from middle (hidden) layer to compute accuracy performance? Do I need to define a custom metric? How would that work?
It seems that what you really want is a multiple output network.
So on top of your middle layer that defines your embedding, add a layer (or more) to do your classification.
Then have a look at Multiple outputs in Keras to create your global cost.
You may also want to start by training the autoendoder only, then the classifier additional layers only to see the performance, you can also balance the accuracy of the encoder vs the accuracy of the classifier as a loss, training "both" networks at the same time.

Resources