dimension of the input layer for embeddings in Keras - keras

It is not clear to me whether there is any difference between specifying the input dimension Input(shape=(20,)) or not Input(shape=(None,)) in the following example:
input_layer = Input(shape=(None,))
emb = Embedding(86, 300) (input_layer)
lstm = Bidirectional(LSTM(300)) (emb)
output_layer = Dense(10, activation="softmax") (lstm)
model = Model(input_layer, output_layer)
model.compile(optimizer="rmsprop", loss="categorical_crossentropy", metrics=["acc"])
history = model.fit(my_x, my_y, epochs=1, batch_size=632, validation_split=0.1)
my_x (shape: 2000, 20) contains integers referring to characters, while my_y contains the one-hot encoding of some labels. With Input(shape=(None,)), I see that I could use model.predict(my_x[:, 0:10]), i.e., I could give only 10 characters as an input instead of 20: how is that possible? I was assuming that all the 20 dimensions in my_x were needed to predict the corresponding y.

What you say with None is, that the sequences you feed into the model have the strict length of 20. While a model usually needs a fixed length, recurrent neural networks (as the LSTM you use there), do not need a fixed sequence Length. So the LSTM just does not care whether your sequence contains 20 or 100 timesteps, as it simply loops over them. However, when you specify the amount of timesteps to 20, the LSTM expects 20 and will raise an error if it does not get them.
For more information see this post of Tim♦

Related

Audio signal processing using LSTM [duplicate]

This seems to be one of the most common questions about LSTMs in PyTorch, but I am still unable to figure out what should be the input shape to PyTorch LSTM.
Even after following several posts (1, 2, 3) and trying out the solutions, it doesn't seem to work.
Background: I have encoded text sequences (variable length) in a batch of size 12 and the sequences are padded and packed using pad_packed_sequence functionality. MAX_LEN for each sequence is 384 and each token (or word) in the sequence has a dimension of 768. Hence my batch tensor could have one of the following shapes: [12, 384, 768] or [384, 12, 768].
The batch will be my input to the PyTorch rnn module (lstm here).
According to the PyTorch documentation for LSTMs, its input dimensions are (seq_len, batch, input_size) which I understand as following.
seq_len - the number of time steps in each input stream (feature vector length).
batch - the size of each batch of input sequences.
input_size - the dimension for each input token or time step.
lstm = nn.LSTM(input_size=?, hidden_size=?, batch_first=True)
What should be the exact input_size and hidden_size values here?
You have explained the structure of your input, but you haven't made the connection between your input dimensions and the LSTM's expected input dimensions.
Let's break down your input (assigning names to the dimensions):
batch_size: 12
seq_len: 384
input_size / num_features: 768
That means the input_size of the LSTM needs to be 768.
The hidden_size is not dependent on your input, but rather how many features the LSTM should create, which is then used for the hidden state as well as the output, since that is the last hidden state. You have to decide how many features you want to use for the LSTM.
Finally, for the input shape, setting batch_first=True requires the input to have the shape [batch_size, seq_len, input_size], in your case that would be [12, 384, 768].
import torch
import torch.nn as nn
# Size: [batch_size, seq_len, input_size]
input = torch.randn(12, 384, 768)
lstm = nn.LSTM(input_size=768, hidden_size=512, batch_first=True)
output, _ = lstm(input)
output.size() # => torch.Size([12, 384, 512])
The image passed to CNN layer and lstm layer,the feature map shape changes like this
BCHW->BCHW(BxCx1xW),
the CNN's output shape should has the height 1.
then sqeeze the dim of height.
BCHW->BCW
in rnn ,shape name changes,[batch ,seqlen,input_size],in image,[batch,width,channel],
**BCW->BWC,**this is batch_first tensor for LSTM layer(like pytorch).
Finally:
BWC is [batch,seqlen,channel].

How to solve "logits and labels must have the same first dimension" error

I'm trying out different Neural Network architectures for a word based NLP.
So far I've used bidirectional-, embedded- and models with GRU's guided by this tutorial: https://towardsdatascience.com/language-translation-with-rnns-d84d43b40571 and it all worked out well.
When I tried using LSTM's however, I get an error saying:
logits and labels must have the same first dimension, got logits shape [32,186] and labels shape [4704]
How can I solve this?
My source and target dataset consists of 7200 sample sentences. They are integer tokenized and embedded. The source dataset is post padded to match the length of the target dataset.
Here is my model and the relevant code:
lstm_model = Sequential()
lstm_model.add(Embedding(src_vocab_size, 128, input_length=X.shape[1], input_shape=X.shape[1:]))
lstm_model.add(LSTM(128, return_sequences=False, dropout=0.1, recurrent_dropout=0.1))
lstm_model.add(Dense(128, activation='relu'))
lstm_model.add(Dropout(0.5))
lstm_model.add((Dense(target_vocab_size, activation='softmax')))
lstm_model.compile(optimizer=Adam(0.002), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = lstm_model.fit(X, Y, batch_size = 32, callbacks=CALLBACK, epochs = 100, validation_split = 0.25) #At this line the error is raised!
With the shapes:
X.shape = (7200, 147)
Y.shape = (7200, 147, 1)
src_vocab_size = 188
target_vocab_size = 186
I've looked at similar question on here already and tried adding a Reshape layer
simple_lstm_model.add(Reshape((-1,)))
but this only causes the following error:
"TypeError: __int__ returned non-int (type NoneType)"
It's really weird as I preprocess the dataset the same way for all models and it works just fine except for the above.
You should have return_sequences=True and return_state=False in calling the LSTM constructor.
In your snippet, the LSTM only return its last state, instead of the sequence of states for every input embedding. In theory, you could have spotted it from the error message:
logits and labels must have the same first dimension, got logits shape [32,186] and labels shape [4704]
The logits should be three-dimensional: batch size × sequence length × number of classes. The length of the sequences is 147 and indeed 32 × 147 = 4704 (number of your labels). This could have told you the length of the sequences disappeared.

Confusion about Keras RNN Input shape requirement

I have read plenty of posts for this point. They are inconsistent with each other and every answer seems to have a different explanation so I thought to ask based on my analyzing of all of them.
As Keras RNN documentation states, the input shape is always in this form (batch_size, timesteps, input_dim). I am a bit confused about that but I guess, not sure though, that input_dim is always 1 while timesteps depends on your problem (could be the data dimension as well). Is that roughly correct?
The reason for this question is that I always get an error when trying to change the value of input_dim to be my dataset dimension (as input_dim sounds like that!!), so I made an assumption that input_dim represent the shape of the input vector to LSTM at a time. Am I wrong again?
C = C.reshape((C.shape[0], C.shape[1], 1))
tr_C, ts_C, tr_r, ts_r = train_test_split(C, r, train_size=.8)
batch_size = 1000
print('Build model...')
model = Sequential()
model.add(LSTM(8, batch_input_shape=(batch_size, C.shape[1], 1), stateful=True, activation='relu'))
model.add(Dense(1, activation='relu'))
print('Training...')
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(tr_C, tr_r,
batch_size=batch_size, epochs=1,
shuffle=True, validation_data=(ts_C, ts_r))
Thanks!
Indeed, input_dim is the shape of the input vector at a time. In other words, input_dim is the number of the input features.
It's not necessarily 1, though. If you're working with more than one var, it can be any number.
Suppose you have 10 sequences, each sequence has 200 time steps, and you're measuring just a temperature. Then you have one feature:
input_shape = (200,1) -- notice that the batch size (number of sequences) is ignored here
batch_input_shape = (10,200,1) -- only in specific cases, like stateful = True, you will need a batch input shape.
Now suppose you're measuring not only temperature, but also pressure and volume. Now you've got three input features:
input_shape = (200,3)
batch_input_shape = (10,200,3)
In other words, the first dimension is the number of different sequences. The second is the length of the sequence (how many measures along time). And the last is how many vars at each time.

Accuracy goes to 0.0000 when training RNN with Keras?

I'm trying to use custom word-embeddings from Spacy for training a sequence -> label RNN query classifier. Here's my code:
word_vector_length = 300
dictionary_size = v.num_tokens + 1
word_vectors = v.get_word_vector_dictionary()
embedding_weights = np.zeros((dictionary_size, word_vector_length))
max_length = 186
for word, index in dictionary._get_raw_id_to_token().items():
if word in word_vectors:
embedding_weights[index,:] = word_vectors[word]
model = Sequential()
model.add(Embedding(input_dim=dictionary_size, output_dim=word_vector_length,
input_length= max_length, mask_zero=True, weights=[embedding_weights]))
model.add(Bidirectional(LSTM(128, activation= 'relu', return_sequences=False)))
model.add(Dense(v.num_labels, activation= 'sigmoid'))
model.compile(loss = 'binary_crossentropy',
optimizer = 'adam',
metrics = ['accuracy'])
model.fit(X_train, Y_train, batch_size=200, nb_epoch=20)
here the word_vectors are stripped from spacy.vectors and have length 300, the input is an np_array which looks like [0,0,12,15,0...] of dimension 186, where the integers are the token ids in the input, and I've constructed the embedded weight matrix accordingly. The output layer is [0,0,1,0,...0] of length 26 for each training sample, indicating the label that should go with this piece of vectorized text.
This looks like it should work, but during the first epoch the training accuracy is continually decreasing... and by the end of the first epoch/for the rest of training, it's exactly 0 and I'm not sure why this is happening. I've trained plenty of models with keras/TF before and never encountered this issue.
Any idea what might be happening here?
Are the labels always one-hot? Meaning only one of the elements of the label vector is one and the rest zero.
If so, then maybe try using a softmax activation with a categorical crossentropy loss like in the following official example:
https://github.com/fchollet/keras/blob/master/examples/babi_memnn.py#L202
This will help constraint the network to output probability distributions on the last layer (i.e. the softmax layer outputs sum up to 1).

Dimensions not matching in keras LSTM model

I want to use an LSTM neural Network with keras to forecast groups of time series and I am having troubles in making the model match what I want. The dimensions of my data are:
input tensor: (data length, number of series to train, time steps to look back)
output tensor: (data length, number of series to forecast, time steps to look ahead)
Note: I want to keep the dimensions exactly like that, no
transposition.
A dummy data code that reproduces the problem is:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, TimeDistributed, LSTM
epoch_number = 100
batch_size = 20
input_dim = 4
output_dim = 3
look_back = 24
look_ahead = 24
n = 100
trainX = np.random.rand(n, input_dim, look_back)
trainY = np.random.rand(n, output_dim, look_ahead)
print('test X:', trainX.shape)
print('test Y:', trainY.shape)
model = Sequential()
# Add the first LSTM layer (The intermediate layers need to pass the sequences to the next layer)
model.add(LSTM(10, batch_input_shape=(None, input_dim, look_back), return_sequences=True))
# add the first LSTM layer (the dimensions are only needed in the first layer)
model.add(LSTM(10, return_sequences=True))
# the TimeDistributed object allows a 3D output
model.add(TimeDistributed(Dense(look_ahead)))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
model.fit(trainX, trainY, nb_epoch=epoch_number, batch_size=batch_size, verbose=1)
This trows:
Exception: Error when checking model target: expected
timedistributed_1 to have shape (None, 4, 24) but got array with shape
(100, 3, 24)
The problem seems to be when defining the TimeDistributed layer.
How do I define the TimeDistributed layer so that it compiles and trains?
The error message is a bit misleading in your case. Your output node of the network is called timedistributed_1 because that's the last node in your sequential model. What the error message is trying to tell you is that the output of this node does not match the target your model is fitting to, i.e. your labels trainY.
Your trainY has a shape of (n, output_dim, look_ahead), so (100, 3, 24) but the network is producing an output shape of (batch_size, input_dim, look_ahead). The problem in this case is that output_dim != input_dim. If your time dimension changes you may need padding or a network node that removes said timestep.
I think the problem is that you expect output_dim (!= input_dim) at the output of TimeDistributed, while it's not possible. This dimension is what it considers as the time dimension: it is preserved.
The input should be at least 3D, and the dimension of index one will
be considered to be the temporal dimension.
The purpose of TimeDistributed is to apply the same layer to each time step. You can only end up with the same number of time steps as you started with.
If you really need to bring down this dimension from 4 to 3, I think you will need to either add another layer at the end, or use something different from TimeDistributed.
PS: one hint towards finding this issue was that output_dim is never used when creating the model, it only appears in the validation data. While it's only a code smell (there might not be anything wrong with this observation), it's something worth checking.

Resources