How to handle variable shape bias in TensorFlow? - python-3.x

I was just modifying some an LSTM network I had written to print out the test error. The issues, I realized, is that the model I had defined depends on the batch size.
Specifically, the input is a tensor of shape [batch_size, time_steps, features]. The input enters the LSTM cell and the output, which I turn into a list of time_steps 2D tensors, with each 2D tensor having shape [batch_size, hidden_units]. Each 2D tensor is then multiplied by a weight vector of shape [hidden_units] to yield a vector of shape [batch_size] which has added to it a bias vector of shape [batch_size].
In words, I give the model N sequences, and I expect it to output a scalar for each time step for each sequence. That is, the output is a list of N vectors, one for each time step.
For training, I give the model batches of size 13. For the test data, I feed the entire data set, which consists of over 400 examples. Thus, an error is raised, since the bias has fixed shape batch_size.
I haven't found a way to make it's shape variable without raising an error.
I can add complete code if requested. Added code anyways.
Thanks.
def basic_lstm(inputs, number_steps, number_features, number_hidden_units, batch_size):
weights = {
'out': tf.Variable(tf.random_normal([number_hidden_units, 1]))
}
biases = {
'out': tf.Variable(tf.constant(0.1, shape=[batch_size, 1]))
}
lstm_cell = rnn.BasicLSTMCell(number_hidden_units)
init_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
hidden_layer_outputs, states = tf.nn.dynamic_rnn(lstm_cell, inputs,
initial_state=init_state, dtype=tf.float32)
results = tf.squeeze(tf.stack([tf.matmul(output, weights['out'])
+ biases['out'] for output
in tf.unstack(tf.transpose(hidden_layer_outputs, (1, 0, 2)))], axis=1))
return results

You want the biases to be a shape of (batch_size, )
For example (using zeros instead of tf.constant but similar problem), I was able to specify the shape as a single integer:
biases = tf.Variable(tf.zeros(10,dtype=tf.float32))
print(biases.shape)
prints:
(10,)

Related

BCELoss between logits and labels not working

I am using a GPT2 model that outputs logits (before softmax) in the shape (batch_size, num_input_ids, vocab_size) and I need to compare it with the labels that are of shape (batch_size, num_input_ids) to calculate BCELoss. How do I calculate it?
logits = output.logits #--of shape (32, 56, 592)
logits = torch.nn.Softmax()(logits)
labels = labels #---------of shape (32, 56)
torch.nn.BCELoss()(logits, labels)
but the dimensions do not match, so how do I contract logits to labels shape or expand labels to logits shape?
Binary cross-entropy is used when the final classification layer is a sigmoid layer, i.e., for each output dimension, only a true/false output is possible. You can imagine it as assigning some tags to the input. This also means that the labels need to have the same dimension as the logits, having 0/1 for each logit. Statistically speaking, for 592 output dimensions, you predict 592 Bernoulli (= binary) distributions. The expected shape is 32 × 56 × 592.
When using the softmax layer, you assume only one target class is possible; you predict a single categorical distribution over 592 possible output classes. However, in this case, the correct loss function is not binary cross-entropy but categorical cross-entropy, implemented by the CrossEntropyLoss class in PyTorch. Note that it takes the logits directly before the softmax normalization and does the normalization internally. The expected shape is 32 × 56, as in the code snippet.

What exactly does tf.keras.layers.Dense do?

My question
I'm using the Keras to build a convolutional neural network. I ran across the following:
model = tf.keras.Sequential()
model.add(layers.Dense(10*10*256, use_bias=False, input_shape=(100,)))
I'm curious - what exactly mathematically is going on here?
My best guess
My guess is that for input of size [100,N], the network will be evaluated N times, once for each training example. The Dense layer created by layers.Dense contains (10*10*256) * (100) parameters that will be updated during backpropagation.
Dense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).
Note: If the input to the layer has a rank greater than 2, then it is
flattened prior to the initial dot product with kernel.
Example:
# as first layer in a sequential model:
model = Sequential()
model.add(Dense(32, input_shape=(16,)))
# now the model will take as input arrays of shape (*, 16)
# and output arrays of shape (*, 32)
# after the first layer, you don't need to specify
# the size of the input anymore:
model.add(Dense(32))
Arguments :
> units: Positive integer, dimensionality of the output space.
> activation: Activation function to use. If you don't specify anything,
> no activation is applied (ie. "linear" activation: a(x) = x).
> use_bias: Boolean, whether the layer uses a bias vector.
> kernel_initializer: Initializer for the kernel weights matrix.
> bias_initializer: Initializer for the bias vector.
>kernel_regularizer:Regularizer function applied to the kernel weights matrix.
> bias_regularizer: Regularizer function applied to the bias vector.
> activity_regularizer: Regularizer function applied to the output of the layer (its "activation")..
>kernel_constraint: Constraint function applied to the kernel weights matrix.
>bias_constraint: Constraint function applied to the bias vector.
Input shape:
N-D tensor with shape: (batch_size, ..., input_dim). The most common situation would be a 2D input with shape (batch_size, input_dim).
Output shape:
N-D tensor with shape: (batch_size, ..., units). For instance, for a 2D input with shape (batch_size, input_dim), the output would have shape (batch_size, units).

fix/freeze individual kernel weights in a convolutional operation

I'm trying out a workaround for fixing individual kernel weights in a convolutional operation in TensorFlow using Python 3.7. I do it by creating
a trainable variable,
an identical non-trainable variable and
a "mask" tensor consisting of 1s and 0s with the same shape as the created variables in step 1 and 2 above.
A 1 in the "mask" tensor indicates that I want to fix/freeze that specific weight during training, i.e. not update it in the backward pass.
Now, this workaround works perfectly fine when applied to a fully connected layer but fails when applied to a convolutional layer and I can't figure out why or how to make it work.
Something seems to be happening in the tf.nn.conv2d() function call (see code example below) and according to the documentation this is what they do:
Given an input tensor of shape [batch, in_height, in_width, in_channels]
and a filter / kernel tensor of shape
[filter_height, filter_width, in_channels, out_channels], this op
performs the following:
1. Flattens the filter to a 2-D matrix with shape
[filter_height * filter_width * in_channels, output_channels].
2. Extracts image patches from the input tensor to form a virtual
tensor of shape [batch, out_height, out_width,<br>
filter_height * filter_width * in_channels].
3. For each patch, right-multiplies the filter matrix and the image patch
vector.
But since I use weights_frozen which is a tensor and depends on the trainable variable, non-trainable variable and mask_weights it should get zero-valued gradients on the positions where I have a 1 in the mask_weights tensor.
def conv(input_, layer_name...):
weights = tf.get_variable(shape=[filter_height, filter_width, in_channels, out_channels], dtype=tf.float32, initializer=tf.glorot_uniform_initializer(), trainable=True)
weights_fixed = tf.Variable(tf.identity(weights), trainable=False)
mask_weights = tf.placeholder(tf.float32, weights.shape)
weights_frozen = tf.add(tf.multiply(mask_weights, weights_fixed), tf.multiply((1 - mask_weights), weights))
out_conv = tf.nn.conv2d(input=input_, filter=weights_frozen, strides=strides_, padding='SAME')
out_add = tf.nn.bias_add(value=out_conv, bias=biases_frozen)
out = tf.nn.relu(features=out_add)
return out
As mentioned, I expect to get zero-valued gradients on the positions where I have a 1 in the mask_weights tensor, but instead they are non-zero and therefore those weights are being trained, which is not the behavior I'm trying to achieve.

Resnet with Custom Data

I am trying to modify Resnet50 with my custom data as follows:
X = [[1.85, 0.460,... -0.606] ... [0.229, 0.543,... 1.342]]
y = [2, 4, 0, ... 4, 2, 2]
X is a feature vector of length 2000 for 784 images. y is an array of size 784 containing the binary representation of labels.
Here is the code:
def __classifyRenet(self, X, y):
image_input = Input(shape=(2000,1))
num_classes = 5
model = ResNet50(weights='imagenet',include_top=False)
model.summary()
last_layer = model.output
# add a global spatial average pooling layer
x = GlobalAveragePooling2D()(last_layer)
# add fully-connected & dropout layers
x = Dense(512, activation='relu',name='fc-1')(x)
x = Dropout(0.5)(x)
x = Dense(256, activation='relu',name='fc-2')(x)
x = Dropout(0.5)(x)
# a softmax layer for 5 classes
out = Dense(num_classes, activation='softmax',name='output_layer')(x)
# this is the model we will train
custom_resnet_model2 = Model(inputs=model.input, outputs=out)
custom_resnet_model2.summary()
for layer in custom_resnet_model2.layers[:-6]:
layer.trainable = False
custom_resnet_model2.layers[-1].trainable
custom_resnet_model2.compile(loss='categorical_crossentropy',
optimizer='adam',metrics=['accuracy'])
clf = custom_resnet_model2.fit(X, y,
batch_size=32, epochs=32, verbose=1,
validation_data=(X, y))
return clf
I am calling to function as:
clf = self.__classifyRenet(X_train, y_train)
It is giving an error:
ValueError: Error when checking input: expected input_24 to have 4 dimensions, but got array with shape (785, 2000)
Please help. Thank you!
1. First, understand the error.
Your input does not match the input of ResNet, for ResNet, the input should be (n_sample, 224, 224, 3) but you are having (785, 2000). From your question, you have 784 images with array of size 2000, which doesn't really align with the original ResNet50 input shape of (224 x 224) no matter how you reshape it. That means you cannot use the ResNet50 directly with your data. The only thing you did in your code is to take the last layer of ResNet50 and added you output layer to align with your output class size.
2. Then, what you can do.
If you insist to use the ResNet architecture, you will need to change the input layer rather than output layer. Also, you will need to reshape your image data to utilize the convolution layers. That means, you cannot have it in a (2000,) array, but need to be something like (height, width, channel), just like what ResNet and other architectures are doing. Of course you will also need to change the output layer as well just like you did so that you are predicting for your classes. Try something like:
model = ResNet50(input_tensor=image_input_shape, include_top=True,weights='imagenet')
This way, you can specify customized input image shape. You can check the github code for more information (https://github.com/keras-team/keras/blob/master/keras/applications/resnet50.py). Here's part of the docstring:
input_shape: optional shape tuple, only to be specified
if `include_top` is False (otherwise the input shape
has to be `(224, 224, 3)` (with `channels_last` data format)
or `(3, 224, 224)` (with `channels_first` data format).
It should have exactly 3 inputs channels,
and width and height should be no smaller than 197.
E.g. `(200, 200, 3)` would be one valid value.

InvalidArgumentError: logits and labels must have the same first dimension seq2seq Tensorflow

I am getting this error in seq2seq.sequence_loss even though first dim of logits and labels has same dimension, i.e. batchSize
I have created a seq2seq model in TF 1.0 version. My loss function is as follows :
logits = self.decoder_logits_train
targets = self.decoder_train_targets
self.loss = seq2seq.sequence_loss(logits=logits, targets=targets, weights=self.loss_weights)
self.train_op = tf.train.AdamOptimizer().minimize(self.loss)
I am getting following error on running my network while training :
InvalidArgumentError (see above for traceback): logits and labels must have the same first dimension, got logits shape [1280,150000] and labels shape [1536]
[[Node: sequence_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](sequence_loss/Reshape, sequence_loss/Reshape_1)]]
I confirm the shapes of logits and targets tensors as follows :
a,b = sess.run([model.decoder_logits_train, model.decoder_train_targets], feed_dict)
print(np.shape(a)) # (128, 10, 150000) which is (BatchSize, MaxSeqSize, Vocabsize)
print(np.shape(b)) # (128, 12) which is (BatchSize, Max length of seq including padding)
So, since the first dimension of targets and logits are same then why I am getting this error ?
Interestingly, in error u can observe that the dimension of logits is mentioned as (1280, 150000), which is (128 * 10, 150000) [product of first two dimension, vocab_size], and same for targets i.e. (1536), which is (128*12), again product of first two dimension ?
Note : Tensorflow 1.0 CPU version
maybe your way of padding wrong. if you padded _EOS to the end of target seq, then the max_length(real length of target sentence) should add 1 to be [batch, max_len+1]. Since you padded _GO and _EOS, your target sentence length should add 2, which makes it equals 12.
I read some other people's implementation of NMT, they only padded _EOS for target sentence, while _GO for input of decoder. Tell me if I'm wrong.
I had the same error as you and I understood the problem:
The problem:
You run the decoder using this parameters:
targets are the decoder_inputs. They have length max_length because of padding. Shape: [batch_size, max_length]
sequence_length are the non-padded-lengths of all the targets of your current batch. Shape: [batch_size]
Your logits, that are the output tf.contrib.seq2seq.dynamic_decode has shape:
[batch_size, longer_sequence_in_this_batch, n_classes]
Where longer_sequence_in_this_batch is equal to tf.reduce_max(sequence_length)
So, you have a problem when computing the loss because you try to use both:
Your logits with 1st dimension shape longer_sequence_in_this_batch
Your targets with 1st dimension shape max_length
Note that longer_sequence_in_this_batch <= max_length
How to fix it:
You can simply apply some padding to your logits.
logits = self.decoder_logits_train
targets = self.decoder_train_targets
paddings = [[0, 0], [0, max_length-tf.shape(logits)[1]], [0, 0]]
padded_logits = tf.pad(logits, paddings, 'CONSTANT', constant_values=0)
self.loss = seq2seq.sequence_loss(logits=padded_logits, targets=targets,
weights=self.loss_weights)
Using this method,you ensure that your logits will be padded as the targets and will have dimension [batch_size, max_length, n_classes]
For more information about the pad function, visit
Tensorflow's documentation
The error message seems to be a bit misleading, as you actually need first and second dimensions to be the same. This is written here:
logits: A Tensor of shape [batch_size, sequence_length,
num_decoder_symbols] and dtype float. The logits correspond to the
prediction across all classes at each timestep.
targets: A Tensor of shape [batch_size, sequence_length] and dtype
int. The target represents the true class at each timestep.
This also makes sense, as logits are probability vectors, while targets represent the real output, so they need to be of the same length.

Resources