Stateful LSTM with Embedding Layer (shapes don't match) - keras

I am trying to build a stateful LSTM with Keras and I don't understand how to add a embedding layer before the LSTM runs. The problem seems to be the stateful flag. If my net is not stateful adding the embedding layer is quite straight forward and works.
A working stateful LSTM without embedding layer looks at the moment like this:
model = Sequential()
model.add(LSTM(EMBEDDING_DIM,
batch_input_shape=(batchSize, longest_sequence, 1),
return_sequences=True,
stateful=True))
model.add(TimeDistributed(Dense(maximal_value)))
model.add(Activation('softmax'))
model.compile(...)
When adding the Embedding layer I move the batch_input_shape parameter into the Embedding layer i.e. only the first layer needs to known the shape?
Like this:
model = Sequential()
model.add(Embedding(vocabSize+1, EMBEDDING_DIM,batch_input_shape=(batchSize, longest_sequence, 1),))
model.add(LSTM(EMBEDDING_DIM,
return_sequences=True,
stateful=True))
model.add(TimeDistributed(Dense(maximal_value)))
model.add(Activation('softmax'))
model.compile(...)
The exception I get know is Exception: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4
So I am stuck here at the moment. What is the trick to combine word embeddings into a stateful LSTM?

The batch_input_shape parameter of the Embedding layer should be (batch_size, time_steps), where time_steps is the length of the unrolled LSTM / number of cells and batch_size is the number of examples in a batch.
model = Sequential()
model.add(Embedding(
input_dim=input_dim, # e.g, 10 if you have 10 words in your vocabulary
output_dim=embedding_size, # size of the embedded vectors
input_length=time_steps,
batch_input_shape=(batch_size,time_steps)
))
model.add(LSTM(
10,
batch_input_shape=(batch_size,time_steps,embedding_size),
return_sequences=False,
stateful=True)
)
There is an excellent blog post which explains stateful LSTMs in Keras. Also, I've uploaded a gist which contains a simple example of a stateful LSTM with Embedding layer.

Related

Data augmentation in Keras model

I am trying to add data augmentation as a layer to a model but I am getting the following error.
TypeError: The added layer must be an instance of class Layer. Found: <tensorflow.python.keras.preprocessing.image.ImageDataGenerator object at 0x7f8c2dea0710>
data_augmentation = tf.keras.preprocessing.image.ImageDataGenerator(
rotation_range=30, horizontal_flip=True)
model = Sequential()
model.add(data_augmentation)
model.add(Dense(1028,input_shape=(final_features.shape[1],)))
model.add(Dropout(0.7,input_shape=(final_features.shape[1],)))
model.add(Dense(n_classes, activation= 'softmax', kernel_regularizer='l2'))
model.compile(optimizer=adam,
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(final_features, y,
batch_size=batch_size,
epochs=epochs,
validation_split=0.2,
callbacks=[lrr,EarlyStop])
I have also tried this way:
data_augmentation = Sequential(
[
preprocessing.RandomFlip("horizontal"),
preprocessing.RandomRotation(0.1),
preprocessing.RandomZoom(0.1),
]
)
model = Sequential()
model.add(data_augmentation)
model.add(Dense(1028,input_shape=(final_features.shape[1],)))
model.add(Dropout(0.7,input_shape=(final_features.shape[1],)))
model.add(Dense(n_classes, activation= 'softmax', kernel_regularizer='l2'))
model.compile(optimizer=adam,
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(final_features, y,
batch_size=batch_size,
epochs=epochs,
validation_split=0.2,
callbacks=[lrr,EarlyStop])
It gives an error:
ValueError: Input 0 of layer sequential_7 is incompatible with the layer: expected ndim=4, found ndim=2. Full shape received: [128, 14272]
Could you please advice how I can use augmentation in Keras?
In your first case, you are using ImageDataGenerator as a layer, which is not: as the name says, it is just a generator which applies random transformations to images (image augmentation) before feeding the network. So, the images are augmented in CPU and then feed to the neural network which can run in GPU if you have one.
Generators are usually used also to avoid loading huge datasets into memory since they allow to load only the batches being used soon.
In the second case, you are using image augmentation as layers of your model properly. The difference here is that the augmentation is run as part of your model, so if you have a GPU available for instance, those operations will run in GPU.
The problem with your second case is in the model itself (in fact the model is also wrong in the first approach, you only get an error there with the bad usage of ImageDataGenerator before your execution arrives to the model).
Note that you are using images as inputs, so, the input should be of shape (height, width, channels), but then you are starting your model with a dense layer, which expects a single array of shape (n_features,).
If your model needs to start with a Dense layer (strange, but may be ok in some case) then you need first to use Flatten layer to convert images of shape (h,w,c) into vectors of shape (h*w*c,). This change will solve your second approach for sure.
That said, you don't need to specify the input shape on every single layer: doing it in your first layer should be enough.
Last, but not least: are you sure this model is being feed with images? According to your fit call, it looks like you are using previously extracted features that may be vectors (this make sense with your current model architecture but makes no sense with the usage of image augmentation).
Please, provide more details with respect to your data to clarify this point.

shall i use a sequential model or Functional API to model a neural network for two input 2D matrix

Good morning,
i tried to use a sequential model to create my neural network which have a multiple input (concatenated). But i want to know if shall i use The Keras functional API to CREATE my model.
in1= loadtxt('in1.csv', delimiter=',')#2D matrix
in2= loadtxt('in2.csv', delimiter=',')#2D matrix
y= loadtxt('y.csv', delimiter=',') #2D matrix (output labels)
X_train=np.hstack((in1,in2))
y_train=y
model = Sequential()
model.add(Dense(nbinneuron, input_dim=2*nx,activation='tanh',kernel_initializer='normal'))
model.add(Dropout(0.5))
#output layer
model.add(Dense(2, activation='tanh'))
opt =Adalta(lr=0.01)
model.compile(loss='mean_squared_error', optimizer=opt, metrics=['mse'])
# fit the keras model on the dataset
history=model.fit(X_train, y_train,validation_data=(X_test, y_test), epochs=500,verbose=0)
...
thanks an advance
A Sequential Model can only have one input and one output. To build a model with multiple inputs (and/or multiple outputs), you need to use the Functional API.

stacked Bi-LSTM compare to 1 layer of Bi-LSTM

I trained my model on BI LSTM for multi class text classification, but my result is not different when I use 2 stacked BI LSTM like below compare to just 1 layer of BI LSTM, any idea about that ?
max_len = 409
max_words = 17666
emb_dim = 100
BB = Sequential()
BB.add(Embedding(max_words, emb_dim,weights=[embedding_matrix], input_length=max_len))
BB.add(Bidirectional(LSTM(64, return_sequences=True,dropout=0.4, recurrent_dropout=0.4)))
BB.add(Bidirectional(LSTM(34, return_sequences=False,dropout=0.4, recurrent_dropout=0.4)))
BB.add(Dropout(0.5))
BB.add(Dense(3, activation='softmax'))
BB.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
BB.summary()
One thing that could be the issue is that your embeddings dimension is quite low a(100), maybe you could try some other embedding type as text representation, like word2vec, which has an embedding dimension of 300, if I am not mistaken.
On the modeling side, would try experimenting with increasing LSTM units first (try 100 or more), then stacking LSTM layers on top of each other.
I also, noticed that your model is very heavily regularized through Dropout, recurrent and otherwise. Did you notice overfitting while training your model? If not, I would suggest removing those as they can affect training if unnecessarily added.

Stacked LSTM with Multiple Dense Layers After

Andrew Ng talks about Deep RNN architecture by stacking recurrent layers on top of each other. However, he notes that these are usually limited to 2 or 3 recurrent layers due to already complex time-dependent calculations in the structure. But he does add that people commonly add "a bunch of deep layers that are not connected horizontally" after these recurrent layers (Shown as blue boxes that extend from a[3]<1>). I am wondering if he is simply talking about stacking Dense layers on top of the recurrent layers, or is it something more complicated? Something like this in Keras:
model = Sequential()
model.add(keras.layers.LSTM(100, return_sequences=True, batch_input_shape=(32, 1, input_shape), stateful=True))
model.add(keras.layers.LSTM(100, return_sequences=False, stateful=True))
model.add(Dense(100, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
In most cases, yes, the common structure of a RNN after the hidden state includes only dense layers.
However, this can take many forms, such as a dense layer and a softmax layer when predicting the next word of a vocabulary in natural language processing (NLP) (or language modelling) applications (examples here).
Alternatively, for multi-objective prediction, it may the case that multiple separate dense layers are required to generate distinct outputs, such as the value and policy heads in reinforcement learning.
Finally, deep LSTMs can be used as encoders which are part of a larger model that does not necessarily have to include only sequence data. For instance, diagnosing patients with a model that encodes textual notes with a LSTM and encodes images with a CNN, before passing the combined embeddings through final dense layers.

LSTM Text Classification Bad Accuracy Keras

I'm going crazy in this project. This is multi-label text-classification with lstm in keras. My model is this:
model = Sequential()
model.add(Embedding(max_features, embeddings_dim, input_length=max_sent_len, mask_zero=True, weights=[embedding_weights] ))
model.add(Dropout(0.25))
model.add(LSTM(output_dim=embeddings_dim , activation='sigmoid', inner_activation='hard_sigmoid', return_sequences=True))
model.add(Dropout(0.25))
model.add(LSTM(activation='sigmoid', units=embeddings_dim, recurrent_activation='hard_sigmoid', return_sequences=False))
model.add(Dropout(0.25))
model.add(Dense(num_classes))
model.add(Activation('sigmoid'))
adam=keras.optimizers.Adam(lr=0.04)
model.compile(optimizer=adam, loss='categorical_crossentropy', metrics=['accuracy'])
Only that I have too low an accuracy .. with the binary-crossentropy I get a good accuracy, but the results are wrong !!!!! changing to categorical-crossentropy, I get very low accuracy. Do you have any suggestions?
there is my code: GitHubProject - Multi-Label-Text-Classification
In last layer, the activation function you are using is sigmoid, so binary_crossentropy should be used. Incase you want to use categorical_crossentropy then use softmax as activation function in last layer.
Now, coming to the other part of your model, since you are working with text, i would tell you to go for tanh as activation function in LSTM layers.
And you can try using LSTM's dropouts as well like dropout and recurrent dropout
LSTM(units, dropout=0.2, recurrent_dropout=0.2,
activation='tanh')
You can define units as 64 or 128. Start from small number and after testing you take them till 1024.
You can try adding convolution layer as well for extracting features or use Bidirectional LSTM But models based Bidirectional takes time to train.
Moreover, since you are working on text, pre-processing of text and size of training data always play much bigger role than expected.
Edited
Add Class weights in fit parameter
class_weights = class_weight.compute_class_weight('balanced',
np.unique(labels),
labels)
class_weights_dict = dict(zip(le.transform(list(le.classes_)),
class_weights))
model.fit(x_train, y_train, validation_split, class_weight=class_weights_dict)
change:
model.add(Activation('sigmoid'))
to:
model.add(Activation('softmax'))

Resources