I'm confused on how to apply cross entropy loss for my time series model where the output is in the shape of [batch_size, classes, time_steps] and target of shape [batch_size, time_steps, classes]. I'm trying to made the model determine the confidence of the 16 classes at each timesteps. By using the following approach, I get a large loss and the model doesn't seems to be learning:
batch_size = 256
time_steps = 224
classes = 16
y_est = torch.randn((batch_size, classes, time_steps))
y_true = torch.randn((batch_size, time_steps, classes)).view(batch_size, classes, -1)
loss = torch.nn.functional.cross_entropy(y_est, y_true)
Do you think I've made a mistake here?
Pytorch documentation for CrossEntropyLoss:
Input shape: (N, C, d1,...dk)
Output shape: (N, d1,...dk)
Where N is the batch size, and C is the number of classes, with K >= 1 in the case of K-dimensional loss.
So based on the docs, the code should be
batch_size = 256
time_steps = 224
classes = 16
y_est = torch.randn((batch_size, classes, time_steps))
y_true = torch.randn((batch_size, time_steps))
loss = torch.nn.functional.cross_entropy(y_est, y_true)
As #Hatem described, your target tensor should have one dimension less than the predicted tensor because its representation is not a one-hot-encoding but rather a dense encoding (the values represent the class label itself). Whereas your prediction tensor will contain a probability distribution across all possible classes.
So here since your prediction tensor y_est is shaped (batch_size, classes, time_steps), then your target tensor should have a shape of (batch_size, time_steps). If your target is in one-hot-encoding format, you can easily switch back to the required format by applying torch.argmax:
loss = F.cross_entropy(y_est, y_true.argmax(1))
Related
I am using a GPT2 model that outputs logits (before softmax) in the shape (batch_size, num_input_ids, vocab_size) and I need to compare it with the labels that are of shape (batch_size, num_input_ids) to calculate BCELoss. How do I calculate it?
logits = output.logits #--of shape (32, 56, 592)
logits = torch.nn.Softmax()(logits)
labels = labels #---------of shape (32, 56)
torch.nn.BCELoss()(logits, labels)
but the dimensions do not match, so how do I contract logits to labels shape or expand labels to logits shape?
Binary cross-entropy is used when the final classification layer is a sigmoid layer, i.e., for each output dimension, only a true/false output is possible. You can imagine it as assigning some tags to the input. This also means that the labels need to have the same dimension as the logits, having 0/1 for each logit. Statistically speaking, for 592 output dimensions, you predict 592 Bernoulli (= binary) distributions. The expected shape is 32 × 56 × 592.
When using the softmax layer, you assume only one target class is possible; you predict a single categorical distribution over 592 possible output classes. However, in this case, the correct loss function is not binary cross-entropy but categorical cross-entropy, implemented by the CrossEntropyLoss class in PyTorch. Note that it takes the logits directly before the softmax normalization and does the normalization internally. The expected shape is 32 × 56, as in the code snippet.
Let's say I have a layer defined C = nn.Conv2d(1,3,3, bias=False), ie, 1 input channel, 3 output channels and a kernel size of 3x3. The internal weight of this layer is thus a tensor of shape (3,1,3,3); I can access this with C.weight.data. Now suppose that this internal weight is very sparse; it's full of zeros and has only a few nonzero values. I can easily construct a sparse tensor from the weight by:
idx = C.weight.data.nonzero().T
values = C.weight.data[C.weight.data!=0]
sp_T = torch.sparse.FloatTensor(idx, values, C.weight.data.size())
Is it possible to store the conv layer's weights as this sparse tensor somehow? I tried simply doing C.weight.data = sp_T but it throws an error. It would be pretty convenient if we could store all the weights in a model in this sparsified way.
I have a question about the use of the sample_weight parameter in the context of data augmentation in Keras with the ImageDataGenerator. Let's say I have a series of simple images with just one class of objects. So, for each image, I will have a corresponding mask with pixels = 0 for the background and 1 for where the object is labeled.
However, this dataset is unbalanced because a significant amount of these images are empty, which mean with masks just containing 0.
If I understood well, the 'sample_weight' parameter of the flow method of ImageDataGenerator is here to put the focus on the the samples of my dataset that I find more interesting, i.e. where my object is present.
My question is: what is the concrete influence of this sample_weight parameter on the training of my model. Does it influence the data augmentation? If I use the 'validation_split' parameter, does it influence the way validation sets are generated?
Here is the part of my code my question refers to:
data_gen_args = dict(rotation_range=90,
width_shift_range=0.4,
height_shift_range=0.4,
zoom_range=0.4,
horizontal_flip=True,
fill_mode='reflect',
rescale=1. / 255,
validation_split=0.2,
data_format='channels_last'
)
image_datagen = ImageDataGenerator(**data_gen_args)
imf = image_datagen.flow(
x=stacked_images_channel,
y=stacked_masks_channel,
batch_size=batch_size,
shuffle=False,
seed=seed,subset='training',
sample_weight = sample_weight,
save_to_dir = 'traindir',
save_prefix = 'train_'
)
valf = image_datagen.flow(
x=stacked_images_channel,
y=stacked_masks_channel,
batch_size=batch_size,
shuffle=False,
seed=seed,subset='validation',
sample_weight = sample_weight,
save_to_dir = 'valdir',
save_prefix = 'val_'
)
STEP_SIZE_TRAIN=imf.n//imf.batch_size
STEP_SIZE_VALID=valf.n//valf.batch_size
model = unet.UNet2(numberOfClasses, imshape, '', learningRate, depth=4)
history = model.fit_generator(generator=imf,
steps_per_epoch=STEP_SIZE_TRAIN,
epochs=epochs,
validation_data=valf,
validation_steps=STEP_SIZE_VALID,
verbose=2
)
Thank you in advance for your attention.
As for Keras 2.2.5 with preprocessing at 1.1.0, the sample_weight is passed along with the samples and applied during processing. When calling .fit_generator, the model is trained on batches, each batch using sample weights:
model.train_on_batch(x, y,
sample_weight=sample_weight,
class_weight=class_weight)
In the source code of .train_on_batch, the documentation states: "sample_weight: Optional array of the same length as x, containing weights to apply to the model's loss for each sample. (...)". The actual application of weights happens when calculating loss on each batch. When compiling a model, Keras generates a "weighted loss" function out of the desired loss function. The weighted computation is stated in the code as:
def weighted(y_true, y_pred, weights, mask=None):
"""Wrapper function.
# Arguments
y_true: `y_true` argument of `fn`.
y_pred: `y_pred` argument of `fn`.
weights: Weights tensor.
mask: Mask tensor.
# Returns
Scalar tensor.
"""
# score_array has ndim >= 2
score_array = fn(y_true, y_pred)
if mask is not None:
# Cast the mask to floatX to avoid float64 upcasting in Theano
mask = K.cast(mask, K.floatx())
# mask should have the same shape as score_array
score_array *= mask
# the loss per batch should be proportional
# to the number of unmasked samples.
score_array /= K.mean(mask) + K.epsilon()
# apply sample weighting
if weights is not None:
# reduce score_array to same ndim as weight array
ndim = K.ndim(score_array)
weight_ndim = K.ndim(weights)
score_array = K.mean(score_array,
axis=list(range(weight_ndim, ndim)))
score_array *= weights
score_array /= K.mean(K.cast(K.not_equal(weights, 0), K.floatx()))
return K.mean(score_array)
This wrapper shows it first calculates the desired loss (call to fn(y_true, y_pred)), then applies weighing if weights where passed (either with sample_weight or class_weight).
With this context in mind:
what is the concrete influence of this sample_weight parameter on the training of my model.
Weights are basically multiplied to the loss (and normalized). So "heavy" weights (more than 1) samples cause more loss, so larger gradients. "Light" weights reduce the importance of the sample and lead to smaller gradients.
Does it influence the data augmentation?
It depends on what you mean. Here is what I can say from experience, where I perform augmentation before feeding a Keras data generator (doing so as there were issues in preprocessing, as far as I know still existing in Preprocessing 1.1.0):
When feeding already augmented data to the generator, the .flow call will require a sample weights list as long as the input data. So the influence of weighing on augmentation depends on how the weights are chosen. A data point augmented N times may assign the same weight to each augmentation, or 1/N depending on the intent.
The default behaviour in Keras seems to assign the same weight to each augmentation (transform) performed by Keras. The code looks pretty clear, although I have never relied on it.
If I use the 'validation_split' parameter, does it influence the way validation sets are generated?
The sample_weight parameter does not seem to interfere with validation_split. I have not looked into the code specifically, but splitting basically gets the input data, and keeps a split for validation---whatever the data is. When sample_weight is added, what changes is each data point: Without weight, data is (x, y); with weight, data becomes (x, y, weight).
I am trying to modify Resnet50 with my custom data as follows:
X = [[1.85, 0.460,... -0.606] ... [0.229, 0.543,... 1.342]]
y = [2, 4, 0, ... 4, 2, 2]
X is a feature vector of length 2000 for 784 images. y is an array of size 784 containing the binary representation of labels.
Here is the code:
def __classifyRenet(self, X, y):
image_input = Input(shape=(2000,1))
num_classes = 5
model = ResNet50(weights='imagenet',include_top=False)
model.summary()
last_layer = model.output
# add a global spatial average pooling layer
x = GlobalAveragePooling2D()(last_layer)
# add fully-connected & dropout layers
x = Dense(512, activation='relu',name='fc-1')(x)
x = Dropout(0.5)(x)
x = Dense(256, activation='relu',name='fc-2')(x)
x = Dropout(0.5)(x)
# a softmax layer for 5 classes
out = Dense(num_classes, activation='softmax',name='output_layer')(x)
# this is the model we will train
custom_resnet_model2 = Model(inputs=model.input, outputs=out)
custom_resnet_model2.summary()
for layer in custom_resnet_model2.layers[:-6]:
layer.trainable = False
custom_resnet_model2.layers[-1].trainable
custom_resnet_model2.compile(loss='categorical_crossentropy',
optimizer='adam',metrics=['accuracy'])
clf = custom_resnet_model2.fit(X, y,
batch_size=32, epochs=32, verbose=1,
validation_data=(X, y))
return clf
I am calling to function as:
clf = self.__classifyRenet(X_train, y_train)
It is giving an error:
ValueError: Error when checking input: expected input_24 to have 4 dimensions, but got array with shape (785, 2000)
Please help. Thank you!
1. First, understand the error.
Your input does not match the input of ResNet, for ResNet, the input should be (n_sample, 224, 224, 3) but you are having (785, 2000). From your question, you have 784 images with array of size 2000, which doesn't really align with the original ResNet50 input shape of (224 x 224) no matter how you reshape it. That means you cannot use the ResNet50 directly with your data. The only thing you did in your code is to take the last layer of ResNet50 and added you output layer to align with your output class size.
2. Then, what you can do.
If you insist to use the ResNet architecture, you will need to change the input layer rather than output layer. Also, you will need to reshape your image data to utilize the convolution layers. That means, you cannot have it in a (2000,) array, but need to be something like (height, width, channel), just like what ResNet and other architectures are doing. Of course you will also need to change the output layer as well just like you did so that you are predicting for your classes. Try something like:
model = ResNet50(input_tensor=image_input_shape, include_top=True,weights='imagenet')
This way, you can specify customized input image shape. You can check the github code for more information (https://github.com/keras-team/keras/blob/master/keras/applications/resnet50.py). Here's part of the docstring:
input_shape: optional shape tuple, only to be specified
if `include_top` is False (otherwise the input shape
has to be `(224, 224, 3)` (with `channels_last` data format)
or `(3, 224, 224)` (with `channels_first` data format).
It should have exactly 3 inputs channels,
and width and height should be no smaller than 197.
E.g. `(200, 200, 3)` would be one valid value.
I was just modifying some an LSTM network I had written to print out the test error. The issues, I realized, is that the model I had defined depends on the batch size.
Specifically, the input is a tensor of shape [batch_size, time_steps, features]. The input enters the LSTM cell and the output, which I turn into a list of time_steps 2D tensors, with each 2D tensor having shape [batch_size, hidden_units]. Each 2D tensor is then multiplied by a weight vector of shape [hidden_units] to yield a vector of shape [batch_size] which has added to it a bias vector of shape [batch_size].
In words, I give the model N sequences, and I expect it to output a scalar for each time step for each sequence. That is, the output is a list of N vectors, one for each time step.
For training, I give the model batches of size 13. For the test data, I feed the entire data set, which consists of over 400 examples. Thus, an error is raised, since the bias has fixed shape batch_size.
I haven't found a way to make it's shape variable without raising an error.
I can add complete code if requested. Added code anyways.
Thanks.
def basic_lstm(inputs, number_steps, number_features, number_hidden_units, batch_size):
weights = {
'out': tf.Variable(tf.random_normal([number_hidden_units, 1]))
}
biases = {
'out': tf.Variable(tf.constant(0.1, shape=[batch_size, 1]))
}
lstm_cell = rnn.BasicLSTMCell(number_hidden_units)
init_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
hidden_layer_outputs, states = tf.nn.dynamic_rnn(lstm_cell, inputs,
initial_state=init_state, dtype=tf.float32)
results = tf.squeeze(tf.stack([tf.matmul(output, weights['out'])
+ biases['out'] for output
in tf.unstack(tf.transpose(hidden_layer_outputs, (1, 0, 2)))], axis=1))
return results
You want the biases to be a shape of (batch_size, )
For example (using zeros instead of tf.constant but similar problem), I was able to specify the shape as a single integer:
biases = tf.Variable(tf.zeros(10,dtype=tf.float32))
print(biases.shape)
prints:
(10,)