LSTM Text Classification Bad Accuracy Keras - keras

I'm going crazy in this project. This is multi-label text-classification with lstm in keras. My model is this:
model = Sequential()
model.add(Embedding(max_features, embeddings_dim, input_length=max_sent_len, mask_zero=True, weights=[embedding_weights] ))
model.add(Dropout(0.25))
model.add(LSTM(output_dim=embeddings_dim , activation='sigmoid', inner_activation='hard_sigmoid', return_sequences=True))
model.add(Dropout(0.25))
model.add(LSTM(activation='sigmoid', units=embeddings_dim, recurrent_activation='hard_sigmoid', return_sequences=False))
model.add(Dropout(0.25))
model.add(Dense(num_classes))
model.add(Activation('sigmoid'))
adam=keras.optimizers.Adam(lr=0.04)
model.compile(optimizer=adam, loss='categorical_crossentropy', metrics=['accuracy'])
Only that I have too low an accuracy .. with the binary-crossentropy I get a good accuracy, but the results are wrong !!!!! changing to categorical-crossentropy, I get very low accuracy. Do you have any suggestions?
there is my code: GitHubProject - Multi-Label-Text-Classification

In last layer, the activation function you are using is sigmoid, so binary_crossentropy should be used. Incase you want to use categorical_crossentropy then use softmax as activation function in last layer.
Now, coming to the other part of your model, since you are working with text, i would tell you to go for tanh as activation function in LSTM layers.
And you can try using LSTM's dropouts as well like dropout and recurrent dropout
LSTM(units, dropout=0.2, recurrent_dropout=0.2,
activation='tanh')
You can define units as 64 or 128. Start from small number and after testing you take them till 1024.
You can try adding convolution layer as well for extracting features or use Bidirectional LSTM But models based Bidirectional takes time to train.
Moreover, since you are working on text, pre-processing of text and size of training data always play much bigger role than expected.
Edited
Add Class weights in fit parameter
class_weights = class_weight.compute_class_weight('balanced',
np.unique(labels),
labels)
class_weights_dict = dict(zip(le.transform(list(le.classes_)),
class_weights))
model.fit(x_train, y_train, validation_split, class_weight=class_weights_dict)

change:
model.add(Activation('sigmoid'))
to:
model.add(Activation('softmax'))

Related

loss and accurancy stay the same after training

I have an lstm model for human activity recognition task using data from sensors.
when I train my model the loss and accuracy stays the same. Which is the problem in this case in general?
I try to change the learning rate but the results are the same,
below is the model that I use
model = Sequential()
model.add(LSTM(64, return_sequences=True, recurrent_regularizer=l2(0.0015), input_shape=(timestamps,
input_dim)))
model.add(Dropout(0.5))
model.add(LSTM(64, recurrent_regularizer=l2(0.0015), input_shape=(timesteps,input_dim)))
model.add(Dense(64, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(n_classes, activation='softmax'))
model.summary()
model.compile(optimizer=Adam(learning_rate = 0.0025), loss =
'sparse_categorical_crossentropy',
metrics = ['accuracy'])
history =model.fit(X_train, y_train, batch_size=32, epochs=100)
the dataset is balanced across classes and I have used the standard scaller
lying 68704
running 68704
walking 68704
climbingdown 68704
jumping 68704
climbingup 68704
standing 68704
sitting 68704
i find the solution after remove the l2 regularization. I do not know exactly how this work but I gonna investigate the issue a little more

Batch Normalization when CNN with only 2 ConvLayer?

I wonder if it is a problem to use BatchNormalization when there are only 2 convolutional layers in a CNN.
Can this have adverse effects on classification performance? Now I don't mean the training time, but really the accuracy? Is my network overloaded with unneccessary layers? I want to train the network with a small data set.
model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), input_shape=(28,28,1), padding = 'same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
model.add(Conv2D(64, kernel_size=(3,3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
model.compilke(optimizer="Adam", loss='categorical_crossentropy, metrics =['accuracy'])
Many thanks.
Don’t Use With Dropout
Batch normalization offers some regularization effect, reducing generalization error, perhaps no longer requiring the use of dropout for regularization.
Removing Dropout from Modified BN-Inception speeds up training, without increasing overfitting.
— Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 2015.
Further, it may not be a good idea to use batch normalization and dropout in the same network.
The reason is that the statistics used to normalize the activations of the prior layer may become noisy given the random dropping out of nodes during the dropout procedure.
Batch normalization also sometimes reduces generalization error and allows dropout to be omitted, due to the noise in the estimate of the statistics used to normalize each variable.
— Page 425, Deep Learning, 2016.
Source - machinelearningmastery.com - batch normalization

How do we compare the performance of different ConvNets?

I am currently training a net to play a game with a CNN having the following architecture:
model = Sequential()
model.add(Conv2D(100, kernel_size=(2, 2), strides=(2, 2), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(250, activation='relu'))
model.add(Dense(classifications, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
Now I wish to introduce some complexity in the architecture and make the net deep. How can I tabulate the performance of the CNNs of different complexities and ultimately conclude by giving the best choice for the particular task?
Am i going in the wrong direction? How to decide the depth of a CNN and how does it affect the performance on the same dataset?
Thanks in advance (I am new to this site, kindly excuse the immaturity of this post)
Edit: Information about the dataset I am using: dataset consists of images and each image has 3 possible lables (0, 1, 2) stored in a CSV file with each row corresponding to that particular image.
The simplest thing you can do is generate a few different model architectures, train them on a train set and evaluate them on the test set. Then compare their accuracies and the one with the highest accuracy should in theory be the best performing model.
To make the model deeper you can add extra dense or convolutional layers. For example:
changing this:
model.add(Dense(250, activation='relu'))
to this:
model.add(Dense(250, activation='relu'))
model.add(Dense(250, activation='relu'))
model.add(Dense(250, activation='relu'))
will add three extra dense layers. Hence making the network deeper.
You can do the same with duplicating the convolutional layers by duplicating the Conv2D and MaxPooling2D lines.
The alternative to 'trial and error' approach to finding the best architecture and hyperparameters is to use a search approach like explained in this tutorial that user grid search. It will, however, take significantly longer than just trying out a few versions you can up with yourself.

Training accuracy stuck at 6% and loss at 2.7 using LSTM

I am using LSTM for action recognition. My basic LSTM implemented on Keras is getting accuracy of 76% on test data on one dataset(CAD60) but when I use other dataset, my model gets stuck at a loss. Its predicting a single class always.
What can be the problem since I am using the exact framework, features on both the dataset. Even I tried to tune the learning rate change the optimizer but it didn't worked.
model = Sequential() # input has shape (samples, timesteps, locations)
model.add(LSTM(128, batch_input_shape=(batch_size, timesteps, data_dim)))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])

Stateful LSTM with Embedding Layer (shapes don't match)

I am trying to build a stateful LSTM with Keras and I don't understand how to add a embedding layer before the LSTM runs. The problem seems to be the stateful flag. If my net is not stateful adding the embedding layer is quite straight forward and works.
A working stateful LSTM without embedding layer looks at the moment like this:
model = Sequential()
model.add(LSTM(EMBEDDING_DIM,
batch_input_shape=(batchSize, longest_sequence, 1),
return_sequences=True,
stateful=True))
model.add(TimeDistributed(Dense(maximal_value)))
model.add(Activation('softmax'))
model.compile(...)
When adding the Embedding layer I move the batch_input_shape parameter into the Embedding layer i.e. only the first layer needs to known the shape?
Like this:
model = Sequential()
model.add(Embedding(vocabSize+1, EMBEDDING_DIM,batch_input_shape=(batchSize, longest_sequence, 1),))
model.add(LSTM(EMBEDDING_DIM,
return_sequences=True,
stateful=True))
model.add(TimeDistributed(Dense(maximal_value)))
model.add(Activation('softmax'))
model.compile(...)
The exception I get know is Exception: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4
So I am stuck here at the moment. What is the trick to combine word embeddings into a stateful LSTM?
The batch_input_shape parameter of the Embedding layer should be (batch_size, time_steps), where time_steps is the length of the unrolled LSTM / number of cells and batch_size is the number of examples in a batch.
model = Sequential()
model.add(Embedding(
input_dim=input_dim, # e.g, 10 if you have 10 words in your vocabulary
output_dim=embedding_size, # size of the embedded vectors
input_length=time_steps,
batch_input_shape=(batch_size,time_steps)
))
model.add(LSTM(
10,
batch_input_shape=(batch_size,time_steps,embedding_size),
return_sequences=False,
stateful=True)
)
There is an excellent blog post which explains stateful LSTMs in Keras. Also, I've uploaded a gist which contains a simple example of a stateful LSTM with Embedding layer.

Resources