My question
I'm using the Keras to build a convolutional neural network. I ran across the following:
model = tf.keras.Sequential()
model.add(layers.Dense(10*10*256, use_bias=False, input_shape=(100,)))
I'm curious - what exactly mathematically is going on here?
My best guess
My guess is that for input of size [100,N], the network will be evaluated N times, once for each training example. The Dense layer created by layers.Dense contains (10*10*256) * (100) parameters that will be updated during backpropagation.
Dense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).
Note: If the input to the layer has a rank greater than 2, then it is
flattened prior to the initial dot product with kernel.
Example:
# as first layer in a sequential model:
model = Sequential()
model.add(Dense(32, input_shape=(16,)))
# now the model will take as input arrays of shape (*, 16)
# and output arrays of shape (*, 32)
# after the first layer, you don't need to specify
# the size of the input anymore:
model.add(Dense(32))
Arguments :
> units: Positive integer, dimensionality of the output space.
> activation: Activation function to use. If you don't specify anything,
> no activation is applied (ie. "linear" activation: a(x) = x).
> use_bias: Boolean, whether the layer uses a bias vector.
> kernel_initializer: Initializer for the kernel weights matrix.
> bias_initializer: Initializer for the bias vector.
>kernel_regularizer:Regularizer function applied to the kernel weights matrix.
> bias_regularizer: Regularizer function applied to the bias vector.
> activity_regularizer: Regularizer function applied to the output of the layer (its "activation")..
>kernel_constraint: Constraint function applied to the kernel weights matrix.
>bias_constraint: Constraint function applied to the bias vector.
Input shape:
N-D tensor with shape: (batch_size, ..., input_dim). The most common situation would be a 2D input with shape (batch_size, input_dim).
Output shape:
N-D tensor with shape: (batch_size, ..., units). For instance, for a 2D input with shape (batch_size, input_dim), the output would have shape (batch_size, units).
Related
I am using a GPT2 model that outputs logits (before softmax) in the shape (batch_size, num_input_ids, vocab_size) and I need to compare it with the labels that are of shape (batch_size, num_input_ids) to calculate BCELoss. How do I calculate it?
logits = output.logits #--of shape (32, 56, 592)
logits = torch.nn.Softmax()(logits)
labels = labels #---------of shape (32, 56)
torch.nn.BCELoss()(logits, labels)
but the dimensions do not match, so how do I contract logits to labels shape or expand labels to logits shape?
Binary cross-entropy is used when the final classification layer is a sigmoid layer, i.e., for each output dimension, only a true/false output is possible. You can imagine it as assigning some tags to the input. This also means that the labels need to have the same dimension as the logits, having 0/1 for each logit. Statistically speaking, for 592 output dimensions, you predict 592 Bernoulli (= binary) distributions. The expected shape is 32 × 56 × 592.
When using the softmax layer, you assume only one target class is possible; you predict a single categorical distribution over 592 possible output classes. However, in this case, the correct loss function is not binary cross-entropy but categorical cross-entropy, implemented by the CrossEntropyLoss class in PyTorch. Note that it takes the logits directly before the softmax normalization and does the normalization internally. The expected shape is 32 × 56, as in the code snippet.
I have 7 categorical Featues
And i am trying to add A CNN Layer after Embedding Layer
My first Layer is input Layer
Second Layer is Embedding Layer
Third Layer I want to add a Conv2D Layer
I've tried input_shape=(7,36,1) in Conv_2D but that didn't work
input2 = Input(shape=(7,))
embedding2 = Embedding(76474, 36)(input2)
# 76474 is the number of datapoints (rows)
# 36 is the output dim of embedding Layer
cnn1 = Conv2D(64, (3, 3), activation='relu')(embedding2)
flat2 = Flatten()(cnn1)
But i'm getting this error
Input 0 of layer conv2d is incompatible with the layer: expected
ndim=4, found ndim=3. Full shape received: [None, 7, 36]
The output of an embedding layer is 3D, namely (samples, seq_length, features), where features = 36 is the dimensionality of the embedding space, and seq_length = 7 is the sequence length. A Conv2D layer requires an image, which is usually represented as a 4D tensor (samples, width, height, channels).
Only a Conv1D layer would make sense, as it also takes 3D-shaped data, typically (samples, width, channels), and then you need to decide if you want to do convolution across the sequence length, or across the features dimension. That's something you need to experiment with, which in the end is to decide which is the "spatial dimension" in the output of the embedding
I am implementing a custom loss function in keras. The model is an autoencoder. The first layer is an Embedding layer, which embed an input of size (batch_size, sentence_length) into (batch_size, sentence_length, embedding_dimension). Then the model compresses the embedding into a vector of a certain dimension, and finaly must reconstruct the embedding (batch_size, sentence_lenght, embedding_dimension).
But the embedding layer is trainable, and the loss must use the weights of the embedding layer (I have to sum over all word embeddings of my vocabulary).
For exemple, if I want to train on the toy exemple : "the cat". The sentence_length is 2 and suppose embedding_dimension is 10 and the vocabulary size is 50, so the embedding matrix has shape (50,10). The Embedding layer's output X is of shape (1,2,10). Then it passes in the model and the output X_hat, is also of shape (1,2,10). The model must be trained to maximize the probability that the vector X_hat[0] representing 'the' is the most similar to the vector X[0] representing 'the' in the Embedding layer, and same thing for 'cat'. But the loss is such that I have to compute the cosine similarity between X and X_hat, normalized by the sum of cosine similarity of X_hat and every embedding (50, since the vocabulary size is 50) in the embedding matrix, which are the columns of the weights of the embedding layer.
But How can I access the weights in the embedding layer at each iteration of the training process?
Thank you !
It seems a bit crazy but it seems to work : instead of creating a custom loss function that I would pass in model.compile, the network computes the loss (Eq. 1 from arxiv.org/pdf/1708.04729.pdf) in a function that I call with Lambda :
loss = Lambda(lambda x: similarity(x[0], x[1], x[2]))([X_hat, X, embedding_matrix])
And the network has two outputs: X_hat and loss, but I weight X_hat to have 0 weight and loss to have all the weight :
model = Model(input_sequence, [X_hat, loss])
model.compile(loss=mean_squared_error,
optimizer=optimizer,
loss_weights=[0., 1.])
When I train the model :
for i in range(epochs):
for j in range(num_data):
input_embedding = model.layers[1].get_weights()[0][[data[j:j+1]]]
y = [input_embedding, 0] #The embedding of the input
model.fit(data[j:j+1], y, batch_size=1, ...)
That way, the model is trained to tend loss toward 0, and when I want to use the trained model's prediction I use the first output which is the reconstruction X_hat
What is the meaning of the two Dense in this code?
self.model.add(Flatten())
self.model.add(Dense(512))
self.model.add(Activation('relu'))
self.model.add(Dropout(0.5))
self.model.add(Dense(10))
self.model.add(Activation('softmax'))
self.model.summary()
Dense is the only actual network layer in that model.
A Dense layer feeds all outputs from the previous layer to all its neurons, each neuron providing one output to the next layer.
It's the most basic layer in neural networks.
A Dense(10) has ten neurons. A Dense(512) has 512 neurons.
Furthermore, a dense layers applies the a non-linear transform:
f(W.X + b)
As to the effect, well in the case that W and X are a 2D tensor W.X + b is a vector and f is a element wise non-linearity like tanh, so the result is just a vector of size in the numbers of neurons
From the keras docs:
Dense implements the operation: output = activation(dot(input, kernel)
bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created
by the layer, and bias is a bias vector created by the layer (only
applicable if use_bias is True).
I was just modifying some an LSTM network I had written to print out the test error. The issues, I realized, is that the model I had defined depends on the batch size.
Specifically, the input is a tensor of shape [batch_size, time_steps, features]. The input enters the LSTM cell and the output, which I turn into a list of time_steps 2D tensors, with each 2D tensor having shape [batch_size, hidden_units]. Each 2D tensor is then multiplied by a weight vector of shape [hidden_units] to yield a vector of shape [batch_size] which has added to it a bias vector of shape [batch_size].
In words, I give the model N sequences, and I expect it to output a scalar for each time step for each sequence. That is, the output is a list of N vectors, one for each time step.
For training, I give the model batches of size 13. For the test data, I feed the entire data set, which consists of over 400 examples. Thus, an error is raised, since the bias has fixed shape batch_size.
I haven't found a way to make it's shape variable without raising an error.
I can add complete code if requested. Added code anyways.
Thanks.
def basic_lstm(inputs, number_steps, number_features, number_hidden_units, batch_size):
weights = {
'out': tf.Variable(tf.random_normal([number_hidden_units, 1]))
}
biases = {
'out': tf.Variable(tf.constant(0.1, shape=[batch_size, 1]))
}
lstm_cell = rnn.BasicLSTMCell(number_hidden_units)
init_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
hidden_layer_outputs, states = tf.nn.dynamic_rnn(lstm_cell, inputs,
initial_state=init_state, dtype=tf.float32)
results = tf.squeeze(tf.stack([tf.matmul(output, weights['out'])
+ biases['out'] for output
in tf.unstack(tf.transpose(hidden_layer_outputs, (1, 0, 2)))], axis=1))
return results
You want the biases to be a shape of (batch_size, )
For example (using zeros instead of tf.constant but similar problem), I was able to specify the shape as a single integer:
biases = tf.Variable(tf.zeros(10,dtype=tf.float32))
print(biases.shape)
prints:
(10,)