a problem when i use cross-entropy loss as a loss function - pytorch

when i using cross-entropy loss as a loss function, i get this Dimension out of range error
Traceback (most recent call last):
File "e:\testcode\cnn.py", line 122, in <module>
loss = loss_func(output, b_y) # cross entropy loss
File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\loss.py", line 916, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 2021, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 1317, in log_softmax
ret = input.log_softmax(dim)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
and the loss function is
loss = loss_func(output, b_y)
the value of output is
tensor([-0.3507, 0.2214, 0.3781, 0.3057], grad_fn=<SelectBackward>)
the value of b_y is
tensor([3])

The first argument passed to the CrossEntropyLoss has to be a 2d tensor with the shape of [batch size x number of classes]. If you are only calculating the loss for a single batch, unsqueeze the logits before passing them to the loss function.
logits = torch.tensor([-0.3507, 0.2214, 0.3781, 0.3057]).unsqueeze(0)
targets = torch.tensor([3])
loss_func = torch.nn.CrossEntropyLoss()
loss_func(logits, targets)

Related

PyTorch RuntimeError: Expected target size [8, 1182], got [8, 256]

I have a PyTorch model composed of a Distilbert and a BiLSTM with the following structure. Its purpose involves performing token classification over a vast amount of categories (num_labels=1182) by attaching the output of the transformer to the input of the BiLSTM.
import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers import AutoModelForTokenClassification
import utilities as utils
from global_constants import MAX_DOC_LENGTH
class CustomTorchModel(nn.Module):
def __init__(self, args_model_name_or_path):
id_to_label, label_to_id = utils.unshelve_label_converters()
label_qty = len(list(label_to_id))
self.distilbert_layer = AutoModelForTokenClassification.from_pretrained(
args_model_name_or_path,
id2label=id_to_label,
label2id=label_to_id,
num_labels=label_qty
)
self.bilstm_layer = nn.LSTM(input_size=MAX_DOC_LENGTH,
hidden_size=self.distilbert_layer.config.dim,
num_layers=1,
batch_first=True,
bidirectional=True)
def forward(self, inputs):
print("input_ids size: " + str(inputs[0].size()))
print("attention_mask size: " + str(inputs[1].size()))
distilbert_output = self.distilbert_layer(input_ids=inputs[0], attention_mask=inputs[1])
print("distilbert_output.last_hidden_state size: " + str(distilbert_output.last_hidden_state.size()))
bilstm_output, (last_hidden, last_cell) = self.bilstm_layer(distilbert_output.last_hidden_state)
print("BiLSTM output size: " + str(bilstm_output.size()))
output = self.classification_layer(bilstm_output)
print("output size: " + str(output.size()))
return F.softmax(output)
Output showing the shapes after each layer. Notes: 256 is the value of MAX_DOC_LENGTH, 768 is self.distilbert_layer.config.dim and 1182 is num_labels.
input_ids size: torch.Size([8, 256])
attention_mask size: torch.Size([8, 256])
distilbert_output.last_hidden_state size: torch.Size([8, 256, 768])
BiLSTM output size: torch.Size([8, 256, 1536])
output size: torch.Size([8, 256, 1182])
This custom model is used in a pretty standard Ignite script which leverages to train the model. Since there are multiple categories and this is not binary classification, the loss function should be nn.CrossEntropyLoss:
criterion = nn.CrossEntropyLoss(reduction='mean')
optimizer = AdamW(model.parameters(), lr=1e-5)
lr_scheduler = ExponentialLR(optimizer, gamma=0.90)
trainer = create_supervised_trainer1(model.to(device), optimizer, criterion, device=device)
This is the definition of the methods used above:
def _prepare_batch(batch, device=None, non_blocking=False):
x = [batch["input_ids"], batch["attention_mask"]] # list
y = batch["labels"]
return (convert_tensor(x, device=device, non_blocking=non_blocking),
convert_tensor(y, device=device, non_blocking=non_blocking))
def create_supervised_trainer1(model, optimizer, loss_fn, metrics={}, device=None):
def _update(engine, batch):
model.train()
optimizer.zero_grad()
x, y = _prepare_batch(batch, device=device)
y_pred = model(x)
transposed_y_pred = torch.transpose(y_pred, 1, 2)
loss = loss_fn(transposed_y_pred, y.long())
loss.backward()
optimizer.step()
return loss.item(), transposed_y_pred, y.long()
def _metrics_transform(output):
return output[1], output[2]
engine = Engine(_update)
for name, metric in metrics.items():
metric._output_transform = _metrics_transform
metric.attach(engine, name)
return engine
I know I am missing something, however I'm not being able to figure out what. The execution produces an error related to the shapes (the "y" of the DataLoaders has [8, 256] and the network produces [8, 1182]. This happens even though I rearranged the tensors in the order required by CrossEntropyLoss:
Current run is terminating due to exception: Expected target size [8, 1182], got [8, 256]
Engine run is terminating due to exception: Expected target size [8, 1182], got [8, 256]
Traceback (most recent call last):
File "/home/users/user/august/src/main/ignite_script.py", line 456, in run
trainer.run(train_dataloader, max_epochs=epochs)
File "/home/users/user/.local/lib/python3.9/site-packages/ignite/engine/engine.py", line 892, in run
return self._internal_run()
File "/home/users/user/.local/lib/python3.9/site-packages/ignite/engine/engine.py", line 935, in _internal_run
return next(self._internal_run_generator)
File "/home/users/user/.local/lib/python3.9/site-packages/ignite/engine/engine.py", line 993, in _internal_run_as_gen
self._handle_exception(e)
File "/home/users/user/.local/lib/python3.9/site-packages/ignite/engine/engine.py", line 638, in _handle_exception
raise e
File "/home/users/user/.local/lib/python3.9/site-packages/ignite/engine/engine.py", line 959, in _internal_run_as_gen
epoch_time_taken += yield from self._run_once_on_dataset_as_gen()
File "/home/users/user/.local/lib/python3.9/site-packages/ignite/engine/engine.py", line 1087, in _run_once_on_dataset_as_gen
self._handle_exception(e)
File "/home/users/user/.local/lib/python3.9/site-packages/ignite/engine/engine.py", line 638, in _handle_exception
raise e
File "/home/users/user/.local/lib/python3.9/site-packages/ignite/engine/engine.py", line 1068, in _run_once_on_dataset_as_gen
self.state.output = self._process_function(self, self.state.batch)
File "/home/users/user/august/src/main/ignite_script.py", line 321, in _update
loss = loss_fn(y_pred, y.float())
File "/home/users/user/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/users/user/.local/lib/python3.9/site-packages/torch/nn/modules/loss.py", line 1163, in forward
return F.cross_entropy(input, target, weight=self.weight,
File "/home/users/user/.local/lib/python3.9/site-packages/torch/nn/functional.py", line 2996, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: Expected target size [8, 1182], got [8, 256]
According to the documentation of
nn.CrossEntropyLoss, you can specify the targets in two ways:
class indices for each sample
probabilities of each class for each sample
but you should do any of above mentioned ways by a specific shaped target:
given:
> criterion = nn.CrossEntropyLoss()
> loss = criterion(input, target)
Input:
Shape: (C)(C), (N, C)(N,C) or (N, C, d_1, d_2, ..., d_K)(N,C,d1​,d2​,...,dK) with K≥1 in the case of K-dimensional loss.
Target:
If containing class indices, shape:
()(), (N)(N) or (N, d_1, d_2, ..., d_K)(N,d 1,d2,...,dK) with K≥1 in the case of K-dimensional loss where each value should be between [0, C)[0,C).
If containing class probabilities,
same shape as the input and each value should be between [0, 1][0,1].
Output:
If reduction is ‘none’, shape ()(), (N)(N) or (N, d_1, d_2, ..., d_K)(N,d1,d2,...,dK) with K≥1 in the case of K-dimensional loss, depending on the shape of the input. Otherwise, scalar.

Tensorflow map_fn Error PartialTensorShape: Incompatible ranks during merge

The following code is giving me an error which I cannot find the answer to. I am trying to apply a python function to each element of a tensor, which transforms the element into a vector of shape 3, so I can calculate a custom evaluation metric. It needs to be a Python function as it is used in other places too.
The error (log below) is Invalid argument: PartialTensorShape: Incompatible ranks during merge: 1 vs. 0, and I assume it has to do with the result of map_fn and its shape. However, it only happens at runtime as if I have any other shape then it throws an error with incompatible shapes when I do model.compile(). Have I misundertood how to use map_fn? Any suggestions?
Thanks in advance!
2021-04-09 12:19:31.357542: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at list_kernels.h:101 : Invalid argument: PartialTensorShape: Incompatible ranks during merge: 1 vs. 0
Traceback (most recent call last):
File "test.py", line 93, in <module>
validation_data=(val_input, val_output))
File "/home/user/anaconda3/envs/tf_models/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
return method(self, *args, **kwargs)
File "/home/user/anaconda3/envs/tf_models/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1098, in fit
tmp_logs = train_function(iterator)
File "/home/user/anaconda3/envs/tf_models/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
result = self._call(*args, **kwds)
File "/home/user/anaconda3/envs/tf_models/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call
return self._stateless_fn(*args, **kwds)
File "/home/user/anaconda3/envs/tf_models/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2829, in __call__
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/home/user/anaconda3/envs/tf_models/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
cancellation_manager=cancellation_manager)
File "/home/user/anaconda3/envs/tf_models/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/home/user/anaconda3/envs/tf_models/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 550, in call
ctx=ctx)
File "/home/user/anaconda3/envs/tf_models/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: PartialTensorShape: Incompatible ranks during merge: 1 vs. 0
[[node map/TensorArrayV2Stack/TensorListStack (defined at test.py:27) ]]
[[map_1/while/LoopCond/_50/_64]]
(1) Invalid argument: PartialTensorShape: Incompatible ranks during merge: 1 vs. 0
[[node map/TensorArrayV2Stack/TensorListStack (defined at test.py:27) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_823]
Function call stack:
train_function -> train_function
This is the code to reproduce the issue, using Tensorflow 2.3.1 and Python 3.6.
from typing import List
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, Input, Flatten
INPUT_SHAPE = (2, 10, 10)
class CustomMetric(tf.keras.metrics.Metric):
def __init__(self, name='custom_metric', **kwargs):
super().__init__(name=name, **kwargs)
self.mean_custom_metric = self.add_weight(name='mean_custom_metric', initializer='zeros', dtype=float)
def update_state(self, y_true, y_pred, sample_weight=None):
# y_true is a probability distribution (batch, 2*10*10), so find index of most likely position
y_pred = tf.argmax(y_pred, axis=1)
# y_pred and y_true are both tensors with shape (batch, 1)
print(f"y_pred: {y_pred}")
# apply python func to convert each value to a 3D value (single scalar to vector with 3 scalars)
# according to docs: map_fn(fn, elems).shape = [elems.shape[0]] + fn(elems[0]).shape.
# So: elems.shape[0] == batch | fn(elems[0]).shape == 3,
# error happens when trying to do anything with the result of map_fn below
y_true_positions = tf.map_fn(self.wrapper, y_true, fn_output_signature=tf.float32)
y_pred_positions = tf.map_fn(self.wrapper, y_pred, fn_output_signature=tf.float32)
# y_true_positions, y_pred_positions: tensors with shape (batch, 3)
print(f"y_true_positions: {y_true_positions}")
# do something with y_true_positions and y_pred_positions
y_final = y_true_positions
mean = tf.reduce_sum(y_final)
print('---')
self.mean_custom_metric.assign(mean)
def result(self):
return self.mean_custom_metric
def reset_states(self):
self.mean_custom_metric.assign(0.0)
def wrapper(self, x):
# x: tensor with shape (1,)
print(f"x: {x}")
result = tf.py_function(python_function, [int(x)], tf.float32)
# result is a tensor of shape unknown
print(f"result: {result}")
result.set_shape(tf.TensorShape(3))
# result: tensor with shape (3,)
print(f"result: {result}")
return result
def python_function(index: int) -> List[float]:
# dummy function
return [0, 0, 0]
# dummy model
block_positions = Input(shape=(*INPUT_SHAPE, 1), dtype=tf.float32)
block_positions_layer = Flatten()(block_positions)
target_output_layer = Dense(128, activation='relu')(block_positions_layer)
target_output = Dense(np.prod(INPUT_SHAPE), activation='softmax', name='regions')(target_output_layer)
model = tf.keras.models.Model(
inputs=[block_positions],
outputs=(target_output))
custom_metric = CustomMetric()
model.compile(
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
optimizer=tf.optimizers.Adam(learning_rate=0.001),
metrics=['accuracy', custom_metric])
print(model.summary())
# placeholder data
train_input = np.zeros(shape=(100, *INPUT_SHAPE), dtype=np.float32)
train_output = np.zeros(shape=(100, 1), dtype=np.int32)
val_input = np.zeros(shape=(100, *INPUT_SHAPE), dtype=np.float32)
val_output = np.zeros(shape=(100, 1), dtype=np.int32)
history = model.fit(
train_input, train_output, epochs=10, verbose=1,
validation_data=(val_input, val_output))
I found the solution after a while. The wrapper function was returning a tensor of shape (3,), whereas the map_fn was applied over a tensor of shape (batch, 1). I don't fully understand why, but it seems that map_fn requires a return tensor of shape (batch, 1,) and not fn(elems[0]).shape as the documentation suggests.
Changing the line:
result.set_shape(tf.TensorShape(3))
for
result = tf.reshape(tf.concat(result, 1), (1, 3)) in wrapper
so that the return value is (1, 3) instead of (3) fixed the issue. After map_fn, you end up with a tensor of shape (batch, 1, 3), which I reshaped to be (batch, 3).

Error in ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

The input image size is 512*512,in order to suit to the input of resnet.input image
I used
_img = Image.open(self.images[index]).convert('RGB')
in dataloader.
I used resnet50 as my network's backbone without fc.The output shape is
[4,2048,16,16]
then used two (conv bn relu) and a interpolate
def forward(self, input):
x=self.backbone(input)
x = self.conv1(x)
x= self.bn1(x)
x = self.relu(x)
x = self.conv2(x)
x= self.bn2(x)
x = self.relu(x)
x = F.interpolate(x, size=[512,512], mode='bilinear', align_corners=True)
return x
The part of training
self.criterion=nn.CrossEntropyLoss()
if self.args.cuda:
image, target = image.cuda(), target.cuda()
self.scheduler(self.optimizer, i, epoch, self.best_pred)
self.optimizer.zero_grad()
output = self.model(image)
loss = self.criterion(output, target.long())
loss.backward()
But the Error occurs
File "E:/python_workspace/1006/train.py", line 135, in training
loss = self.criterion(output, target.long())
File "E:\python_workspace\1006\utils\loss.py", line 28, in CrossEntropyLoss
loss = criterion(logit, target.long())
File "E:\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "E:\anaconda3\lib\site-packages\torch\nn\modules\loss.py", line 916, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "E:\anaconda3\lib\site-packages\torch\nn\functional.py", line 1995, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "E:\anaconda3\lib\site-packages\torch\nn\functional.py", line 1826, in nll_loss
ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at C:\w\1\s\tmp_conda_3.6_045031\conda\conda-bld\pytorch_1565412750030\work\aten\src\THNN/generic/SpatialClassNLLCriterion.c:111
image.shape is [4, 3, 512, 512],dtype is torch.float32
target.shape is [4, 512, 512],dtype is torch.float32
output.shape is [4, 3, 512, 512],dtype is torch.float32
target image
The target images all only have three different colors.so I set output to 3 channel.And there's Image mode is P
Where may have problems in my code?
Judging by the sizes of your ternsors, your batch_size=4. You are trying to predict one of three labels per pixel, that is n_classes=3.
The error you got:
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed.
Means that the target.long() you provide to your loss function has values either negative or larger than n_classes.
Check the way you read the ground truth labels. If it's a P type image, you need to read it as such and not convert it to RGB values.
PS,
Do not use align_corners=True in F.interpolate, it introduces distortions.

Reshape layer after a dense layer in Keras

I am trying to understand why there is a mismatch dimensionality between a Dense Layer and a Reshape Layer. Shouldn't this snippet code be correct? The dimensionality of the Dense Layer output will be image_resize^2 * 128, why is there a conflict in the reshape?
input_shape = (28,28,1)
inputs = Input(shape=input_shape)
image_size = 28
image_resize = image_size // 4
x = Dense(image_resize * image_resize * 128)(inputs)
x = Reshape((image_resize, image_resize, 128))(x)
This is the error that shows up:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/venv/lib/python3.7/site-packages/keras/engine/base_layer.py", line 474, in __call__
output_shape = self.compute_output_shape(input_shape)
File "/Users/venv/lib/python3.7/site-packages/keras/layers/core.py", line 398, in compute_output_shape
input_shape[1:], self.target_shape)
File "/Users/venv/lib/python3.7/site-packages/keras/layers/core.py", line 386, in _fix_unknown_dimension
raise ValueError(msg)
ValueError: total size of new array must be unchanged
Dense layers act on the last dimension of the input data, if you want to give image input to a Dense layer, you should first flatten it:
x = Flatten()(x)
x = Dense(image_resize * image_resize * 128)(x)
x = Reshape((image_resize, image_resize, 128))(x)
Then the Reshape will work.
Your input is 784 and enters Dense layer with 7*7*128
So you will have and output of 784*7*7*128 in Reshape, not 7*7*128

Scikit-learn ValueError: unknown is not supported when using confusion matrix

I'm using the confusion_matrix module to visualize class prediction results compared to actual values.
val= ... #shape (3000,1,30) dtype float32
pred = ... #shape (3000,1,30) dtype float32
cnf_matrix = confusion_matrix(val, pred) #ERROR HERE
I got this error:
Traceback (most recent call last): File "vis.py", line 757, in
cnf_matrix = confusion_matrix(y_test, y_pred) File "C:\Anaconda\envs\nn35\lib\site-packages\sklearn\metrics\classification.py",
line 240, in confusion_matrix
y_type, y_true, y_pred = _check_targets(y_true, y_pred) File "C:\Anaconda\envs\nn35\lib\site-packages\sklearn\metrics\classification.py",
line 89, in _check_targets
raise ValueError("{0} is not supported".format(y_type)) ValueError: unknown is not supported
What did I do wrong?
Problem was the shape of the true value and prediction must be (3000,30) not (3000,1,30). So I reshape it using pred= np.reshape(pred, (pred.shape[0], 30))

Resources