1D CNN for multistep multiclass timeseries classification - keras

Suppose you have a timeseries classification task with n_classes possible classes, and you want to output the probability of each class for every timestep (like seq2seq). How can we achieve mult-step multi-class classification with a Conv1D network?
With a RNN it would be something like:
# input shape (n_samples, n_timesteps, n_features)
layer = LSTM(n_neurons, return_sequences=True, input_shape=(n_timesteps n_features))
layer = Dense(n_classes, activation="softmax")(layer)
# objective output shape (n_samples, n_timesteps, n_classes)
keras would know that since the LSTM layer is outputting sequences then it should wrap the Dense layer around a TimeDistributed layer.
How would we achieve the same but with a Conv1D layer? I imagine we need to manually wrap the last Dense layer with a time distributed, but I don't know how to build the network before to output sequences
# input shape (n_samples, n_timesteps, n_features)
layer = Conv1D(n_neurons, 3, input_shape=(n_timesteps n_features))
# ... ?
# can we use a Flatten layer in this case?
layer = TimeDistributed(Dense(n_classes, activation="softmax"))(layer)
# objective output shape (n_samples, n_timesteps, n_classes)

Related

how should I fix input size in keras model

I want to implement MNIST with MLP using keras, for beginning I just use 2 layer, but I got the error:"expected activation_9 to have 3 dimensions, but got array with shape (60000, 10)".How can I fix it?
**
input_shape = x_train[0].shape
model = Sequential()
model.add(Dense(64,activation='relu',input_shape=input_shape))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
mdl=model.fit(x_train, y_train, epochs=5, batch_size=128)
**
As your first layer try using:
tf.keras.layers.Flatten()
The dense layer needs a 1-dimensional array but the images are 2d. This layer flattens them to 1d
Dense expects usually 2-d data (batch, _). So, you need to use Flatten() or better use Conv2D layers with Flatten() which is better suited for image classification tasks.

Unable to add CNN Layer after Embedding Layer Keras

I have 7 categorical Featues
And i am trying to add A CNN Layer after Embedding Layer
My first Layer is input Layer
Second Layer is Embedding Layer
Third Layer I want to add a Conv2D Layer
I've tried input_shape=(7,36,1) in Conv_2D but that didn't work
input2 = Input(shape=(7,))
embedding2 = Embedding(76474, 36)(input2)
# 76474 is the number of datapoints (rows)
# 36 is the output dim of embedding Layer
cnn1 = Conv2D(64, (3, 3), activation='relu')(embedding2)
flat2 = Flatten()(cnn1)
But i'm getting this error
Input 0 of layer conv2d is incompatible with the layer: expected
ndim=4, found ndim=3. Full shape received: [None, 7, 36]
The output of an embedding layer is 3D, namely (samples, seq_length, features), where features = 36 is the dimensionality of the embedding space, and seq_length = 7 is the sequence length. A Conv2D layer requires an image, which is usually represented as a 4D tensor (samples, width, height, channels).
Only a Conv1D layer would make sense, as it also takes 3D-shaped data, typically (samples, width, channels), and then you need to decide if you want to do convolution across the sequence length, or across the features dimension. That's something you need to experiment with, which in the end is to decide which is the "spatial dimension" in the output of the embedding

Keras LSTM, expected 3 but got array with shape []

I am trying to find out label associated with word from annotated text. I am using a bidirectional LSTM. I have X_train which is having shape (1676, 39) and Y_train with the same shape (1676, 39).
input = Input(shape=(sequence_length,))
model = Embedding(input_dim=n_words, output_dim=20,
input_length=sequence_length, mask_zero=True)(input)
model = Bidirectional(LSTM(units=50, return_sequences=True,
recurrent_dropout=0.1))(model)
out_model = TimeDistributed(Dense(50, activation="softmax"))(model)
model = Model(input, out_model)
model.compile(optimizer="rmsprop", loss= "categorical_crossentropy", metrics=["accuracy"])
model.fit(X_train, Y_train, batch_size=32, epochs= 10,
validation_split=0.1)
While executing this, I am getting error:
ValueError: Error when checking target: expected time_distributed_5 to have 3 dimensions, but got array with shape (1676, 39).
I am not able to find out how to feed proper dimension which is needed by the Keras LSTM model.
In the LSTM you set return_sequences=True, as a result, the outputs of the layer is a Tensor with shape of [batch_size * 39 * 50]. Then you pass this Tensor to TimeDistributed layer. TimeDistributed apply Dense layer on the each time stamp. The outputs of the layer, again is [batch_size * 39 * 50]. As you see, you pass 3 dimension Tensor for prediction, while your ground truth is 2 dimension (1676, 39).
How to fix the issue?
1) Remove return_sequences=True from LSTM args.
2) Remove TimeDistributed layer and apply Dense layer directly.
inps = keras.layers.Input(shape=(39,))
embedding = keras.layers.Embedding(vocab_size, 16)(inps)
rnn = keras.layers.LSTM(50)(embedding)
dense = keras.layers.Dense(50, activation="softmax")(rnn)
prediction = keras.layers.Dense(39, activation='softmax')(dense)

Keras : How to use weights of a layer in loss function?

I am implementing a custom loss function in keras. The model is an autoencoder. The first layer is an Embedding layer, which embed an input of size (batch_size, sentence_length) into (batch_size, sentence_length, embedding_dimension). Then the model compresses the embedding into a vector of a certain dimension, and finaly must reconstruct the embedding (batch_size, sentence_lenght, embedding_dimension).
But the embedding layer is trainable, and the loss must use the weights of the embedding layer (I have to sum over all word embeddings of my vocabulary).
For exemple, if I want to train on the toy exemple : "the cat". The sentence_length is 2 and suppose embedding_dimension is 10 and the vocabulary size is 50, so the embedding matrix has shape (50,10). The Embedding layer's output X is of shape (1,2,10). Then it passes in the model and the output X_hat, is also of shape (1,2,10). The model must be trained to maximize the probability that the vector X_hat[0] representing 'the' is the most similar to the vector X[0] representing 'the' in the Embedding layer, and same thing for 'cat'. But the loss is such that I have to compute the cosine similarity between X and X_hat, normalized by the sum of cosine similarity of X_hat and every embedding (50, since the vocabulary size is 50) in the embedding matrix, which are the columns of the weights of the embedding layer.
But How can I access the weights in the embedding layer at each iteration of the training process?
Thank you !
It seems a bit crazy but it seems to work : instead of creating a custom loss function that I would pass in model.compile, the network computes the loss (Eq. 1 from arxiv.org/pdf/1708.04729.pdf) in a function that I call with Lambda :
loss = Lambda(lambda x: similarity(x[0], x[1], x[2]))([X_hat, X, embedding_matrix])
And the network has two outputs: X_hat and loss, but I weight X_hat to have 0 weight and loss to have all the weight :
model = Model(input_sequence, [X_hat, loss])
model.compile(loss=mean_squared_error,
optimizer=optimizer,
loss_weights=[0., 1.])
When I train the model :
for i in range(epochs):
for j in range(num_data):
input_embedding = model.layers[1].get_weights()[0][[data[j:j+1]]]
y = [input_embedding, 0] #The embedding of the input
model.fit(data[j:j+1], y, batch_size=1, ...)
That way, the model is trained to tend loss toward 0, and when I want to use the trained model's prediction I use the first output which is the reconstruction X_hat

Euclidean distance loss function for RNN (keras)

I want to set Euclidean distance as a loss function for LSTM or RNN.
What output should such function have: float, (batch_size) or (batch_size, timesteps)?
Model input X_train is (n_samples, timesteps, data_dim).
Y_train has the same dimensions.
Example code:
def euc_dist_keras(x, y):
return K.sqrt(K.sum(K.square(x - y), axis=-1, keepdims=True))
model = Sequential()
model.add(SimpleRNN(n_units, activation='relu', input_shape=(timesteps, data_dim), return_sequences=True))
model.add(Dense(n_output, activation='linear'))
model.compile(loss=euc_dist_keras, optimizer='adagrad')
model.fit(y_train, y_train, batch_size=512, epochs=10)
So, should I average loss over timesteps dimension and/or batch_size?
A loss functions will take predicted and true labels and will output a scalar, in Keras:
from keras import backend as K
def euc_dist_keras(y_true, y_pred):
return K.sqrt(K.sum(K.square(y_true - y_pred), axis=-1, keepdims=True))
Note, that it will not take X_train as an input. The loss calculation follows the forward propagation step, and it's value provides the goodness of predicted labels compared to true labels.
What output should such function have: float, (batch_size) or
(batch_size, timesteps)?
The loss function should have scalar output.
So, should I average loss over timesteps dimension and/or batch_size?
This would not be required to be able to use Euclidean distance as a loss function.
Side note: In your case, I think the problem might be with the neural network architecture, not the loss. Given (batch_size, timesteps, data_dim) the output of the SimpleRNN will be (batch_size, timesteps, n_units), and the output of Dense layer will be (batch_size, n_output). Thus, given your Y_train has the shape (batch_size, timesteps, data_dim) you would likely need to use TimeDistributed wrapper applying Dense per every temporal slice, and to adjust the number of hidden units in the fully connected layer.

Resources