Many to Many LSTM network - keras

I'm building a many to many network in Keras, using an LSTM. I have sequences of varying length (labels always have the same length as the sequence they describe). To handle the varying length and after searching on other SO posts I've found padding + masking to be the best solution.
This is my model :
So I have n (874) samples of max_len (24) padded sequences with 25 features each. But how do I handle my labels ? Do I pad them too ?
If I pad them like in the same way as my X (with the same special value) I get this :
X_train shape : (873, 24, 25)
y_train shape : (873, 24)
All is fine except I get the following error :
ValueError: Can not squeeze dim[1], expected a dimension of 1, got 24 for '{{node binary_crossentropy/weighted_loss/Squeeze}} = Squeeze[T=DT_FLOAT, squeeze_dims=[-1]](Cast_1)' with input shapes: [1,24].
Searching up this error leads to post about removing retun_sequences=True from my LSTM layer, except I don't want that since each of my timesteps are labelled...
And if I don't pad them, they can't be converted to a tensor to be used by tensorflow.
Edit:
Explanatory illustration of the architecture I want to achieve, courtesy of this answer :https://stackoverflow.com/a/52092176/7732923

Problem found :
X_train shape : (873, 24, 25)
y_train shape : (873, 24)
y_train contained 873 samples of length 24, with one label for each timestep as I said, but, probably due to the possibility of wanting multi-label classification, each label for each timestep must be contained in a list, so the right shape for y_train must be :
y_train shape : (873, 24, 1)
So it was just about wrapping each label between [] during preprocessing, the architecture is sound and works (and now I'm left to determine how well but that's another beast ahah)

Related

dimension of the input layer for embeddings in Keras

It is not clear to me whether there is any difference between specifying the input dimension Input(shape=(20,)) or not Input(shape=(None,)) in the following example:
input_layer = Input(shape=(None,))
emb = Embedding(86, 300) (input_layer)
lstm = Bidirectional(LSTM(300)) (emb)
output_layer = Dense(10, activation="softmax") (lstm)
model = Model(input_layer, output_layer)
model.compile(optimizer="rmsprop", loss="categorical_crossentropy", metrics=["acc"])
history = model.fit(my_x, my_y, epochs=1, batch_size=632, validation_split=0.1)
my_x (shape: 2000, 20) contains integers referring to characters, while my_y contains the one-hot encoding of some labels. With Input(shape=(None,)), I see that I could use model.predict(my_x[:, 0:10]), i.e., I could give only 10 characters as an input instead of 20: how is that possible? I was assuming that all the 20 dimensions in my_x were needed to predict the corresponding y.
What you say with None is, that the sequences you feed into the model have the strict length of 20. While a model usually needs a fixed length, recurrent neural networks (as the LSTM you use there), do not need a fixed sequence Length. So the LSTM just does not care whether your sequence contains 20 or 100 timesteps, as it simply loops over them. However, when you specify the amount of timesteps to 20, the LSTM expects 20 and will raise an error if it does not get them.
For more information see this post of Tim♦

expected dense_20 to have shape (None, 18827) but got array with shape (316491, 1)

I am having some difficulty properly formatting my labels for keras (tensorflow backend). My model takes in an embedding (list of 128 numbers) as input and outputs one of 18827 distinct numbers (ranging from 1 to 20284) as such:
[0.0344733819366,...,0.153029859066] -> 11516
My training data consists of 316491 embedding-number pairings, so when I tried using keras.utils.to_categorical(training_out, num_classes=20284) to convert the number labels to one-hot vectors for categorical_crossentropy, I received a MemoryError. It seems that
sparse_categorical_crossentropy would resolve this issue, since it looks like it only needs one number instead of a large vector as the label. However, I am not sure how to format my labels correctly for this. Currently my model is:
self.brain = Sequential()
self.brain.add(Dense(1000, input_dim=128))
self.brain.add(Dense(20284, activation='softmax'))
self.brain.compile(optimizer='adadelta', loss='categorical_crossentropy', metrics=['accuracy'])
When I try to fit the model I get the following errors, depending on how I format the labels:
ValueError: Error when checking target: expected dense_22 to have shape (None, 18827) but got array with shape (1, 316491)
or
ValueError: Error when checking target: expected dense_20 to have shape (None, 18827) but got array with shape (316491, 1)
18827 is the number of distinct labels I have, but I don't think I specified that number anywhere in my code, so I don't know how or why that is the expected dimension for the labels, especially if each label isn't a vector.
I am unsure of whether I correctly understand sparse_categorical_crossentropy, and if I do, how to use it properly.
Answer in the comment expected dense_20 to have shape (None, 18827) but got array with shape (316491, 1)
From the Keras documentation https://keras.io/losses/
Note: when using the categorical_crossentropy loss, your targets should be in categorical format (e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector that is all-zeros expect for a 1 at the index corresponding to the class of the sample). In order to convert integer targets into categorical targets, you can use the Keras utility to_categorical
In your case, you are not able to one-hot encode because of memory issues, so you need to use sparse_categorical_crossentropy instead.

Wrong input shape to neural network layer

I am trying to classify handwritten digits using the MNIST dataset to train my model. My model trained successfully and hit an accuracy of 98.9%. But when I try and input a custom image it shows me the following error :
Error when checking : expected conv2d_4_input to have shape (None, 28, 28, 1) but got array with shape (1, 1, 28, 28)
This is the first convolutional layer i.e. the input layer.
What can I do to resolve this issue ?
This is my Convolutional Neural Network :
conv_model = Sequential()
conv_model.add(Conv2D(filters, kernel_size[0], input_shape=(28 , 28 , 1)))
conv_model.add(Activation(act))
conv_model.add(Conv2D(filters, kernel_size[0]))
conv_model.add(Activation(act))
conv_model.add(MaxPool2D(pool_size=(2,2)))
conv_model.add(Dropout(0.25))
conv_model.add(Flatten())
conv_model.add(Dense(128))
conv_model.add(Activation(act))
conv_model.add(Dropout(0.5))
conv_model.add(Dense(10))
conv_model.add(Activation('softmax'))
#conv_model.summary()
Compilation Details :
conv_model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
COMPLETE SOURCE CODE :
https://github.com/tanmay-edgelord/HandwrittenDigitRecognition
The image :
If any further details are required please comment.
The error message is pretty straight:
Your first layer is expecting data with shape (None, 28, 28, 1), where "None" can be any number (it's the batch size, how many examples you have).
Your data on the other hand has shape (1, 1, 28, 28).
The confusion seems to me a common one: Keras puts the channels at the last dimension, and your data has the channels in the first.
Solution:
Just reshape your data in the correct format: (1, 28, 28, 1).
But are you trying to give that entire image to the model??? If so, it won't work very well, it's expecting images with 28 x 28 pixels.
You will have to separate each number in a different 28 x 28 image. And you must take into account the possibility of your image being inverted in terms of what is black and what is white. Usually the MNIST data has a black background (0 values) with a white number (1 values).
The problem got solved by passing it to the reshape function with the correct input size
roi2 = roi.reshape(1,28,28,1)

Convolutional NN for text input in PyTorch

I am trying to implement a text classification model using a CNN. As far as I know, for text data, we should use 1d Convolutions. I saw an example in pytorch using Conv2d but I want to know how can I apply Conv1d for text? Or, it is actually not possible?
Here is my model scenario:
Number of in-channels: 1, Number of out-channels: 128
Kernel size : 3 (only want to consider trigrams)
Batch size : 16
So, I will provide tensors of shape, <16, 1, 28, 300> where 28 is the length of a sentence. I want to use Conv1d which will give me 128 feature maps of length 26 (as I am considering trigrams).
I am not sure, how to define nn.Conv1d() for this setting. I can use Conv2d but want to know is it possible to achieve the same using Conv1d?
This example of Conv1d and Pool1d layers into an RNN resolved my issue.
So, I need to consider the embedding dimension as the number of in-channels while using nn.Conv1d as follows.
m = nn.Conv1d(200, 10, 2) # in-channels = 200, out-channels = 10
input = Variable(torch.randn(10, 200, 5)) # 200 = embedding dim, 5 = seq length
feature_maps = m(input)
print(feature_maps.size()) # feature_maps size = 10,10,4
Although I don't work with text data, the input tensor in its current form would only work using conv2d. One possible way to use conv1d would be to concatenate the embeddings in a tensor of shape e.g. <16,1,28*300>. You can reshape the input with view In pytorch.

How to correctly get layer weights from Conv2D in keras?

I have Conv2D layer defines as:
Conv2D(96, kernel_size=(5, 5),
activation='relu',
input_shape=(image_rows, image_cols, 1),
kernel_initializer=initializers.glorot_normal(seed),
bias_initializer=initializers.glorot_uniform(seed),
padding='same',
name='conv_1')
This is the first layer in my network.
Input dimensions are 64 by 160, image is 1 channel.
I am trying to visualize weights from this convolutional layer but not sure how to get them.
Here is how I am doing this now:
1.Call
layer.get_weights()[0]
This returs an array of shape (5, 5, 1, 96). 1 is because images are 1-channel.
2.Take 5 by 5 filters by
layer.get_weights()[0][:,:,:,j][:,:,0]
Very ugly but I am not sure how to simplify this, any comments are very appreciated.
I am not sure in these 5 by 5 squares. Are they filters actually?
If not could anyone please tell how to correctly grab filters from the model?
I tried to display the weights like so only the first 25. I have the same question that you do is this the filter or something else. It doesn't seem to be the same filters that are derived from deep belief networks or stacked RBM's.
Here is the untrained visualized weights:
and here are the trained weights:
Strangely there is no change after training! If you compare them they are identical.
and then the DBN RBM filters layer 1 on top and layer 2 on bottom:
If i set kernel_intialization="ones" then I get filters that look good but the net loss never decreases though with many trial and error changes:
Here is the code to display the 2D Conv Weights / Filters.
ann = Sequential()
x = Conv2D(filters=64,kernel_size=(5,5),input_shape=(32,32,3))
ann.add(x)
ann.add(Activation("relu"))
...
x1w = x.get_weights()[0][:,:,0,:]
for i in range(1,26):
plt.subplot(5,5,i)
plt.imshow(x1w[:,:,i],interpolation="nearest",cmap="gray")
plt.show()
ann.fit(Xtrain, ytrain_indicator, epochs=5, batch_size=32)
x1w = x.get_weights()[0][:,:,0,:]
for i in range(1,26):
plt.subplot(5,5,i)
plt.imshow(x1w[:,:,i],interpolation="nearest",cmap="gray")
plt.show()
---------------------------UPDATE------------------------
So I tried it again with a learning rate of 0.01 instead of 1e-6 and used the images normalized between 0 and 1 instead of 0 and 255 by dividing the images by 255.0. Now the convolution filters are changing and the output of the first convolutional filter looks like so:
The trained filter you'll notice is changed (not by much) with a reasonable learning rate:
Here is image seven of the CIFAR-10 test set:
And here is the output of the first convolution layer:
And if I take the last convolution layer (no dense layers in between) and feed it to a classifier untrained it is similar to classifying raw images in terms of accuracy but if I train the convolution layers the last convolution layer output increases the accuracy of the classifier (random forest).
So I would conclude the convolution layers are indeed filters as well as weights.
In layer.get_weights()[0][:,:,:,:], the dimensions in [:,:,:,:] are x position of the weight, y position of the weight, the n th input to the corresponding conv layer (coming from the previous layer, note that if you try to obtain the weights of first conv layer then this number is 1 because only one input is driven to the first conv layer) and k th filter or kernel in the corresponding layer, respectively. So, the array shape returned by layer.get_weights()[0] can be interpreted as only one input is driven to the layer and 96 filters with 5x5 size are generated. If you want to reach one of the filters, you can type, lets say the 6th filter
print(layer.get_weights()[0][:,:,:,6].squeeze()).
However, if you need the filters of the 2nd conv layer (see model image link attached below), then notice for each of 32 input images or matrices you will have 64 filters. If you want to get the weights of any of them for example weights of the 4th filter generated for the 8th input image, then you should type
print(layer.get_weights()[0][:,:,8,4].squeeze()).
enter image description here

Resources