I am using LSTM for action recognition. My basic LSTM implemented on Keras is getting accuracy of 76% on test data on one dataset(CAD60) but when I use other dataset, my model gets stuck at a loss. Its predicting a single class always.
What can be the problem since I am using the exact framework, features on both the dataset. Even I tried to tune the learning rate change the optimizer but it didn't worked.
model = Sequential() # input has shape (samples, timesteps, locations)
model.add(LSTM(128, batch_input_shape=(batch_size, timesteps, data_dim)))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
Related
I wish to train an LSTM sequential model for prediction analysis.
The network looks as such:
model = Sequential()
## Add the 1st LSTM layer
model.add(LSTM(units=hidden_neurons_1,
input_shape=(sequence_length, nb_features),
return_sequences=True))
## Avoid overfitting
model.add(Dropout(DROPOUT_VALUE))
## Add the 2nd LSTM layer
model.add(LSTM( units=hidden_neurons_2,
return_sequences=False))
## Avoid overfitting
model.add(Dropout(DROPOUT_VALUE))
## Number of outputs
model.add(Dense(units = 1))
## Select activation function
model.add(Activation("linear"))
## Compile the model
model.compile(loss='mean_squared_error', ## Loss function
optimizer='rmsprop') ## Optimizer to update the weights and biases
Here, by default Keras offers a list of certain optimizers as given here.
However, I wish to use Genetic Algorithm as an optimizer to update the weights and biases.
Can somebody please help me out to understand how to achieve this.
I have an lstm model for human activity recognition task using data from sensors.
when I train my model the loss and accuracy stays the same. Which is the problem in this case in general?
I try to change the learning rate but the results are the same,
below is the model that I use
model = Sequential()
model.add(LSTM(64, return_sequences=True, recurrent_regularizer=l2(0.0015), input_shape=(timestamps,
input_dim)))
model.add(Dropout(0.5))
model.add(LSTM(64, recurrent_regularizer=l2(0.0015), input_shape=(timesteps,input_dim)))
model.add(Dense(64, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(n_classes, activation='softmax'))
model.summary()
model.compile(optimizer=Adam(learning_rate = 0.0025), loss =
'sparse_categorical_crossentropy',
metrics = ['accuracy'])
history =model.fit(X_train, y_train, batch_size=32, epochs=100)
the dataset is balanced across classes and I have used the standard scaller
lying 68704
running 68704
walking 68704
climbingdown 68704
jumping 68704
climbingup 68704
standing 68704
sitting 68704
i find the solution after remove the l2 regularization. I do not know exactly how this work but I gonna investigate the issue a little more
This question already has answers here:
How to get reproducible results in keras
(11 answers)
Closed 4 years ago.
I trained a keras model on Mnist keeping the training and model hyperparameters same. The training and validation data was exactly same. I got five different accuracies- 0.71, 0.62, 0.59, 0.52, 0.46 in different training sessions. The model was trained on 8 epochs from scratch everytime
This is the code:
def train():
model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Dense(10, activation="softmax"))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=32, epochs=8, verbose=0)
results = model.evaluate(x_test, y_test, verbose=0)
print(results[1])
for i in range(5):
print(i)
train()
Results:
0
2019-01-23 13:26:54.475110: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
0.7192
1
0.6223
2
0.5976
3
0.5223
4
0.4624
It may be only because the weights of the models are generated randomly everytime. Suppose, I train 2 models with same data and hyperparameters. Since, they have different weights initially, their loss and accuracy would vary. But, after a certain number of epochs, they both would converge at a same point where the accuracies and losses of both the models seem equal. This point could be the minima with respect to the loss, since the data is same. Otherwise, it could be a point from where both the models acquire a same path towards convergence.
In your case, maybe training for a greater number of epochs would bring equal losses and accuracies to all the models.
I'm going crazy in this project. This is multi-label text-classification with lstm in keras. My model is this:
model = Sequential()
model.add(Embedding(max_features, embeddings_dim, input_length=max_sent_len, mask_zero=True, weights=[embedding_weights] ))
model.add(Dropout(0.25))
model.add(LSTM(output_dim=embeddings_dim , activation='sigmoid', inner_activation='hard_sigmoid', return_sequences=True))
model.add(Dropout(0.25))
model.add(LSTM(activation='sigmoid', units=embeddings_dim, recurrent_activation='hard_sigmoid', return_sequences=False))
model.add(Dropout(0.25))
model.add(Dense(num_classes))
model.add(Activation('sigmoid'))
adam=keras.optimizers.Adam(lr=0.04)
model.compile(optimizer=adam, loss='categorical_crossentropy', metrics=['accuracy'])
Only that I have too low an accuracy .. with the binary-crossentropy I get a good accuracy, but the results are wrong !!!!! changing to categorical-crossentropy, I get very low accuracy. Do you have any suggestions?
there is my code: GitHubProject - Multi-Label-Text-Classification
In last layer, the activation function you are using is sigmoid, so binary_crossentropy should be used. Incase you want to use categorical_crossentropy then use softmax as activation function in last layer.
Now, coming to the other part of your model, since you are working with text, i would tell you to go for tanh as activation function in LSTM layers.
And you can try using LSTM's dropouts as well like dropout and recurrent dropout
LSTM(units, dropout=0.2, recurrent_dropout=0.2,
activation='tanh')
You can define units as 64 or 128. Start from small number and after testing you take them till 1024.
You can try adding convolution layer as well for extracting features or use Bidirectional LSTM But models based Bidirectional takes time to train.
Moreover, since you are working on text, pre-processing of text and size of training data always play much bigger role than expected.
Edited
Add Class weights in fit parameter
class_weights = class_weight.compute_class_weight('balanced',
np.unique(labels),
labels)
class_weights_dict = dict(zip(le.transform(list(le.classes_)),
class_weights))
model.fit(x_train, y_train, validation_split, class_weight=class_weights_dict)
change:
model.add(Activation('sigmoid'))
to:
model.add(Activation('softmax'))
I am trying to build a stateful LSTM with Keras and I don't understand how to add a embedding layer before the LSTM runs. The problem seems to be the stateful flag. If my net is not stateful adding the embedding layer is quite straight forward and works.
A working stateful LSTM without embedding layer looks at the moment like this:
model = Sequential()
model.add(LSTM(EMBEDDING_DIM,
batch_input_shape=(batchSize, longest_sequence, 1),
return_sequences=True,
stateful=True))
model.add(TimeDistributed(Dense(maximal_value)))
model.add(Activation('softmax'))
model.compile(...)
When adding the Embedding layer I move the batch_input_shape parameter into the Embedding layer i.e. only the first layer needs to known the shape?
Like this:
model = Sequential()
model.add(Embedding(vocabSize+1, EMBEDDING_DIM,batch_input_shape=(batchSize, longest_sequence, 1),))
model.add(LSTM(EMBEDDING_DIM,
return_sequences=True,
stateful=True))
model.add(TimeDistributed(Dense(maximal_value)))
model.add(Activation('softmax'))
model.compile(...)
The exception I get know is Exception: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4
So I am stuck here at the moment. What is the trick to combine word embeddings into a stateful LSTM?
The batch_input_shape parameter of the Embedding layer should be (batch_size, time_steps), where time_steps is the length of the unrolled LSTM / number of cells and batch_size is the number of examples in a batch.
model = Sequential()
model.add(Embedding(
input_dim=input_dim, # e.g, 10 if you have 10 words in your vocabulary
output_dim=embedding_size, # size of the embedded vectors
input_length=time_steps,
batch_input_shape=(batch_size,time_steps)
))
model.add(LSTM(
10,
batch_input_shape=(batch_size,time_steps,embedding_size),
return_sequences=False,
stateful=True)
)
There is an excellent blog post which explains stateful LSTMs in Keras. Also, I've uploaded a gist which contains a simple example of a stateful LSTM with Embedding layer.