stacked Bi-LSTM compare to 1 layer of Bi-LSTM - keras

I trained my model on BI LSTM for multi class text classification, but my result is not different when I use 2 stacked BI LSTM like below compare to just 1 layer of BI LSTM, any idea about that ?
max_len = 409
max_words = 17666
emb_dim = 100
BB = Sequential()
BB.add(Embedding(max_words, emb_dim,weights=[embedding_matrix], input_length=max_len))
BB.add(Bidirectional(LSTM(64, return_sequences=True,dropout=0.4, recurrent_dropout=0.4)))
BB.add(Bidirectional(LSTM(34, return_sequences=False,dropout=0.4, recurrent_dropout=0.4)))
BB.add(Dropout(0.5))
BB.add(Dense(3, activation='softmax'))
BB.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
BB.summary()

One thing that could be the issue is that your embeddings dimension is quite low a(100), maybe you could try some other embedding type as text representation, like word2vec, which has an embedding dimension of 300, if I am not mistaken.
On the modeling side, would try experimenting with increasing LSTM units first (try 100 or more), then stacking LSTM layers on top of each other.
I also, noticed that your model is very heavily regularized through Dropout, recurrent and otherwise. Did you notice overfitting while training your model? If not, I would suggest removing those as they can affect training if unnecessarily added.

Related

Text Classification - DNN

I am performing text classification using a Deep Neural network. My problem is that I am receiving high accuracy 98 on train data whereas my validation accuracy is 49.
I have tried the following:
Shuffled the data
My train and validation data is 80:20 split
I am using 100 dimensions Glov vector
Any suggestions?
def get_Model():
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=max_length, weights= . [embeddings_matrix], trainable=False),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Conv1D(64, 5, activation='relu'),
tf.keras.layers.MaxPooling1D(pool_size=4),
tf.keras.layers.LSTM(64),
tf.keras.layers.Dense(5, activation='softmax')
])
model.compile(loss='sparse_categorical_crossentropy',optimizer="adam",metrics=['acc'])
model.summary()
return model
Your model is clearly overfitting. Standard tricks to prevent overfitting are:
Adding dropout,
L2 regularization,
Trying a smaller model.
It is rather unusual to use convolutions and LSTMs at the same time (although it is perfectly fine). Perhaps keeping only one of them is the best way to make the network smaller.
My guess is that you are working with a rather small dataset. Having a bigger dataset also helps to prevent overfitting but it is not usually a piece of applicable advice.

Find top layers for a fine-tuned model

I want to use a fine-tuned model, based on MobileNetV2 (pre-trained on Keras). But I need to add top layers in order to classify my images into 2 classes. I would like to know how to choose the "architecture" of layers that I need ?
In some examples, people use SVM Classifer or series of Dense layer with a specific number of neurons as top layers.
The following code (by default), it works :
self.base_model = base_model
x = self.base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(2, activation='softmax')(x)
Is there any methodology to find the best solution ?
I'll recommend either Dropout or BatchNormalization. Dense can be easily overfitted because it has too many parameters in a layer. Both layers can regularize the model well. GlobalAveragePooling2D is a good choice because it also acts like regularizer itself.
I'll also suggest that, for the binary classification problem, you can change the output layer to be Dense(1, activation='sigmoid') to predict only P(class1), where you can calculate P(class2) by 1-P(class1). The loss you should use in this case will be binary_crossentropy instead of categorical_crossentropy.

Keras lstm and dense layer

How is dense layer changing the output coming from LSTM layer? How come that from 50 shaped output from previous layer i get output of size 1 from dense layer that is used for prediction?
Lets say i have this basic model:
model = Sequential()
model.add(LSTM(50,input_shape=(60,1)))
model.add(Dense(1, activation="softmax"))
Is the Dense layer taking the values coming from previous layer and assigning the probablity(using softmax function) of each of the 50 inputs and then taking it out as an output?
No, Dense layers do not work like that, the input has 50-dimensions, and the output will have dimensions equal to the number of neurons, one in this case. The output is a weighted linear combination of the input plus a bias.
Note that with the softmax activation, it makes no sense to use it with a one neuron layer, as the softmax is normalized, the only possible output will be constant 1.0. That's probably now what you want.

Stacked LSTM with Multiple Dense Layers After

Andrew Ng talks about Deep RNN architecture by stacking recurrent layers on top of each other. However, he notes that these are usually limited to 2 or 3 recurrent layers due to already complex time-dependent calculations in the structure. But he does add that people commonly add "a bunch of deep layers that are not connected horizontally" after these recurrent layers (Shown as blue boxes that extend from a[3]<1>). I am wondering if he is simply talking about stacking Dense layers on top of the recurrent layers, or is it something more complicated? Something like this in Keras:
model = Sequential()
model.add(keras.layers.LSTM(100, return_sequences=True, batch_input_shape=(32, 1, input_shape), stateful=True))
model.add(keras.layers.LSTM(100, return_sequences=False, stateful=True))
model.add(Dense(100, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
In most cases, yes, the common structure of a RNN after the hidden state includes only dense layers.
However, this can take many forms, such as a dense layer and a softmax layer when predicting the next word of a vocabulary in natural language processing (NLP) (or language modelling) applications (examples here).
Alternatively, for multi-objective prediction, it may the case that multiple separate dense layers are required to generate distinct outputs, such as the value and policy heads in reinforcement learning.
Finally, deep LSTMs can be used as encoders which are part of a larger model that does not necessarily have to include only sequence data. For instance, diagnosing patients with a model that encodes textual notes with a LSTM and encodes images with a CNN, before passing the combined embeddings through final dense layers.

Stateful LSTM with Embedding Layer (shapes don't match)

I am trying to build a stateful LSTM with Keras and I don't understand how to add a embedding layer before the LSTM runs. The problem seems to be the stateful flag. If my net is not stateful adding the embedding layer is quite straight forward and works.
A working stateful LSTM without embedding layer looks at the moment like this:
model = Sequential()
model.add(LSTM(EMBEDDING_DIM,
batch_input_shape=(batchSize, longest_sequence, 1),
return_sequences=True,
stateful=True))
model.add(TimeDistributed(Dense(maximal_value)))
model.add(Activation('softmax'))
model.compile(...)
When adding the Embedding layer I move the batch_input_shape parameter into the Embedding layer i.e. only the first layer needs to known the shape?
Like this:
model = Sequential()
model.add(Embedding(vocabSize+1, EMBEDDING_DIM,batch_input_shape=(batchSize, longest_sequence, 1),))
model.add(LSTM(EMBEDDING_DIM,
return_sequences=True,
stateful=True))
model.add(TimeDistributed(Dense(maximal_value)))
model.add(Activation('softmax'))
model.compile(...)
The exception I get know is Exception: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4
So I am stuck here at the moment. What is the trick to combine word embeddings into a stateful LSTM?
The batch_input_shape parameter of the Embedding layer should be (batch_size, time_steps), where time_steps is the length of the unrolled LSTM / number of cells and batch_size is the number of examples in a batch.
model = Sequential()
model.add(Embedding(
input_dim=input_dim, # e.g, 10 if you have 10 words in your vocabulary
output_dim=embedding_size, # size of the embedded vectors
input_length=time_steps,
batch_input_shape=(batch_size,time_steps)
))
model.add(LSTM(
10,
batch_input_shape=(batch_size,time_steps,embedding_size),
return_sequences=False,
stateful=True)
)
There is an excellent blog post which explains stateful LSTMs in Keras. Also, I've uploaded a gist which contains a simple example of a stateful LSTM with Embedding layer.

Resources