Input dimension for CrossEntropy Loss in PyTorch - pytorch

For a binary classification problem with batch_size = 1, I have logit and label values using which I need to calculate loss.
logit: tensor([0.1198, 0.1911], device='cuda:0', grad_fn=<AddBackward0>)
label: tensor(1], device='cuda:0')
# calculate loss
loss_criterion = nn.CrossEntropyLoss()
loss_criterion.cuda()
loss = loss_criterion( b_logits, b_labels )
However, this always results in the following error,
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
What input dimensions is the CrossEntropyLoss actually asking for?

You are passing wrong shape of tensors.
shape should be (from doc)
Input: (N,C) where C = number of classes
Target: (N) where each value is 0 ≤ targets[i] ≤ C−1
So here, b_logits shape should be ([1,2]) instead of ([2]) to make it right shape you can use torch.view like b_logits.view(1,-1).
And b_labels shape should be ([1]).
Ex.:
b_logits = torch.tensor([0.1198, 0.1911], requires_grad=True)
b_labels = torch.tensor([1])
loss_criterion = nn.CrossEntropyLoss()
loss = loss_criterion( b_logits.view(1,-1), b_labels )
loss
tensor(0.6581, grad_fn=<NllLossBackward>)

Related

Output of the model depends on the shape of the weights tensor

I want to train the model to sum the three inputs. So it is as simple as possible.
Firstly the weights are initialized randomly. It produces bad error estimate (approx. 0.5)
Then I initialize the weights with zeros. There are two options:
the shape of the weights tensor is [1, 3]
the shape of the weights tensor is [3]
When I choose the 1st option the model still works bad and can't learn this simple formula.
When I choose the 2nd option it works perfect with the error of 10e-12.
Why the result depends on the shape of the weights? Why do I need to initialize the model with zeros to solve this simple problem?
import torch
from torch.nn import Sequential as Seq, Linear as Lin
from torch.optim.lr_scheduler import ReduceLROnPlateau
X = torch.rand((1024, 3))
y = (X[:,0] + X[:,1] + X[:,2])
m = Seq(Lin(3, 1, bias=False))
# 1 option
m[0].weight = torch.nn.parameter.Parameter(torch.tensor([[0, 0, 0]], dtype=torch.float))
# 2 option
#m[0].weight = torch.nn.parameter.Parameter(torch.tensor([0, 0, 0], dtype=torch.float))
optim = torch.optim.SGD(m.parameters(), lr=10e-2)
scheduler = ReduceLROnPlateau(optim, 'min', factor=0.5, patience=20, verbose=True)
mse = torch.nn.MSELoss()
for epoch in range(500):
optim.zero_grad()
out = m(X)
loss = mse(out, y)
loss.backward()
optim.step()
if epoch % 20 == 0:
print(loss.item())
scheduler.step(loss)
First option doesn't learning because it fails with broadcasting: while out.shape == (1024, 1) corresponding targets y has shape of (1024, ). MSELoss, as expected, computes mean of tensor (out - y)^2, which in this case has shape (1024, 1024), clearly wrong objective for this task. At the same time, after applying 2-nd option tensor (out - y)^2 has size (1024, ) and mean of it corresponds to actual mse. Default approach, without explicit changing weights shape (through option 1 and 2), would work if set target shape to (1024, 1) for example by y = y.unsqueeze(-1) after definition of y.

CrossEntropyLoss on sequences

I need to compute the torch.nn.CrossEntropyLoss on sequences.
The output tensor y_est has shape: [batch_size, sequence_length, embedding_dim]. The values are embedded as one-hot vectors with embedding_dim dimensions (y_est is not binary however).
The target tensor y has shape: [batch_size, sequence_length] and contains the integer index of the correct class in the range [0, embedding_dim).
If I compute the loss on the two input data, with the shape described above, I get an error 1.
What I would like to do is described by the cycle at [2]. For each sequence in the batch, I would like the sum of the losses computed on each element in the sequence.
After reading the documentation of torch.nn.CrossEntropyLoss I came up with the solution [3], which seems to compute exactly what I want: the losses computed at point [2] and [3] are equale.
However, since .permute(.) returns a view of the original tensor, I am afraid it might mess up the backward propagation on the loss. Somewhere (I do not remember where, sorry) I have read that views should not be used in computing the loss.
Is my solution correct?
import torch
batch_size = 5
seq_len = 10
emb_dim = 100
y_est = torch.randn( (batch_size, seq_len, emb_dim))
y = torch.randint(0, emb_dim, (batch_size, seq_len) )
print("y_est, batch x seq x emb:", y_est.shape)
print("y, batch x seq", y.shape)
loss_fn = torch.nn.CrossEntropyLoss(reduction="none")
# [1]
# loss = loss_fn(y_est, y)
# error:
# RuntimeError: Expected target size [5, 100], got [5, 10]
[2]
loss = 0
for i in range(y_est.shape[1]):
loss += loss_fn ( y_est[:, i, :], y[:, i]).sum()
print(loss)
[3]
y_est_2 = torch.permute( y_est, (0, 2, 1))
print("y_est_2", y_est_2.shape)
loss2 = loss_fn(y_est_2, y).sum()
print(loss2)
whose output is:
y_est, batch x seq x emb: torch.Size([5, 10, 100])
y, batch x seq torch.Size([5, 10])
tensor(253.9994)
y_est_2 torch.Size([5, 100, 10])
tensor(253.9994)
Is the solution correct (also for what concerns the backward pass)? Is there a better way?
If y_est are probabilities you really want to compute the error/loss of a categorical output in each timestep/element of a sequence then y and y_est have to have the same shape. To do so, the categories/classes of y can be expanded to the same dim as y_est with one-hot encoding
import torch
batch_size = 5
seq_len = 10
emb_dim = 100
y_est = torch.randn( (batch_size, seq_len, emb_dim))
y = torch.randint(0, emb_dim, (batch_size, seq_len) )
y = torch.nn.functional.one_hot(y, num_classes=emb_dim).type(torch.float)
loss_fn = torch.nn.CrossEntropyLoss()
loss = loss_fn(y_est, y)
print(loss)

Error while fitting the data in KNN in python

This is my code
clf = KNN(n_neighbors = 1)
clf.fit(train_x, train_y)
train_predict = clf.predict(train_x)
k = f1_score(train_predict,train_y)
print("Training F1 Score:",k)
test_predict = clf.predict(test_x)
k = f1_score(test_predict,test_y)
print("Test F1 score:",k)
I am getting the error
Found input variables with inconsistent numbers of samples: [668, 223]
The shape of the data is
train_x=(668, 24),
train_y=(223,24)
Please help me, Thanks in advance
If you look at the documentation of the fit function
Fit the model using X as training data and y as target values
Parameters:
X : {array-like, sparse matrix, BallTree, KDTree}
Training data. If array or matrix, shape [n_samples, n_features], or [n_samples, n_samples] if metric=’precomputed’.
y : {array-like, sparse matrix}
Target values of shape = [n_samples] or [n_samples, n_outputs]
It is clear that the shape of train_x and train_y doesn't match the requirements of the fit function. I am guessing that dimensions of the train_x and train_y are flipped in your case.
Therefore try:
clf.fit(train_x.T, train_y.T)

CTCBeamSearchDecoder thinks sequence_length of shape (2,) is not a vector

Trying to run a beam search in a Keras model, I get confusing (and conflicting?) error messages. My model has inputs such as
inputs = Input(name='spectrograms',
shape=(None, hparams["n_spectrogram"]))
input_length = Input(name='len_spectrograms',
shape=[1], dtype='int64')
and the CTC loss function requires the [1] shapes in input and label length. As far as I understand, the output should be obtained with something like
# Stick connectionist temporal classification on the end of the core model
paths = K.function(
[inputs, input_length],
K.ctc_decode(output, input_length, greedy=False, top_paths=4)[0])
but as-is, that leads to a complain about the shape of input_length
ValueError: Shape must be rank 1 but is rank 2 for 'CTCBeamSearchDecoder' (op: 'CTCBeamSearchDecoder') with input shapes: [?,?,44], [?,1].
but if I chop off that dimension
K.ctc_decode(output, input_length[..., 0], greedy=False, top_paths=4)[0])
the model definition runs, but when I run y = paths([x, numpy.array([[30], [30]])]) with a x.shape == (2, 30, 513) I suddenly get
tensorflow.python.framework.errors_impl.InvalidArgumentError: sequence_length is not a vector
[[{{node CTCBeamSearchDecoder}} = CTCBeamSearchDecoder[beam_width=100, merge_repeated=true, top_paths=4, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Log, ToInt32)]]
What am I doing wrong?

InvalidArgumentError: logits and labels must have the same first dimension seq2seq Tensorflow

I am getting this error in seq2seq.sequence_loss even though first dim of logits and labels has same dimension, i.e. batchSize
I have created a seq2seq model in TF 1.0 version. My loss function is as follows :
logits = self.decoder_logits_train
targets = self.decoder_train_targets
self.loss = seq2seq.sequence_loss(logits=logits, targets=targets, weights=self.loss_weights)
self.train_op = tf.train.AdamOptimizer().minimize(self.loss)
I am getting following error on running my network while training :
InvalidArgumentError (see above for traceback): logits and labels must have the same first dimension, got logits shape [1280,150000] and labels shape [1536]
[[Node: sequence_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](sequence_loss/Reshape, sequence_loss/Reshape_1)]]
I confirm the shapes of logits and targets tensors as follows :
a,b = sess.run([model.decoder_logits_train, model.decoder_train_targets], feed_dict)
print(np.shape(a)) # (128, 10, 150000) which is (BatchSize, MaxSeqSize, Vocabsize)
print(np.shape(b)) # (128, 12) which is (BatchSize, Max length of seq including padding)
So, since the first dimension of targets and logits are same then why I am getting this error ?
Interestingly, in error u can observe that the dimension of logits is mentioned as (1280, 150000), which is (128 * 10, 150000) [product of first two dimension, vocab_size], and same for targets i.e. (1536), which is (128*12), again product of first two dimension ?
Note : Tensorflow 1.0 CPU version
maybe your way of padding wrong. if you padded _EOS to the end of target seq, then the max_length(real length of target sentence) should add 1 to be [batch, max_len+1]. Since you padded _GO and _EOS, your target sentence length should add 2, which makes it equals 12.
I read some other people's implementation of NMT, they only padded _EOS for target sentence, while _GO for input of decoder. Tell me if I'm wrong.
I had the same error as you and I understood the problem:
The problem:
You run the decoder using this parameters:
targets are the decoder_inputs. They have length max_length because of padding. Shape: [batch_size, max_length]
sequence_length are the non-padded-lengths of all the targets of your current batch. Shape: [batch_size]
Your logits, that are the output tf.contrib.seq2seq.dynamic_decode has shape:
[batch_size, longer_sequence_in_this_batch, n_classes]
Where longer_sequence_in_this_batch is equal to tf.reduce_max(sequence_length)
So, you have a problem when computing the loss because you try to use both:
Your logits with 1st dimension shape longer_sequence_in_this_batch
Your targets with 1st dimension shape max_length
Note that longer_sequence_in_this_batch <= max_length
How to fix it:
You can simply apply some padding to your logits.
logits = self.decoder_logits_train
targets = self.decoder_train_targets
paddings = [[0, 0], [0, max_length-tf.shape(logits)[1]], [0, 0]]
padded_logits = tf.pad(logits, paddings, 'CONSTANT', constant_values=0)
self.loss = seq2seq.sequence_loss(logits=padded_logits, targets=targets,
weights=self.loss_weights)
Using this method,you ensure that your logits will be padded as the targets and will have dimension [batch_size, max_length, n_classes]
For more information about the pad function, visit
Tensorflow's documentation
The error message seems to be a bit misleading, as you actually need first and second dimensions to be the same. This is written here:
logits: A Tensor of shape [batch_size, sequence_length,
num_decoder_symbols] and dtype float. The logits correspond to the
prediction across all classes at each timestep.
targets: A Tensor of shape [batch_size, sequence_length] and dtype
int. The target represents the true class at each timestep.
This also makes sense, as logits are probability vectors, while targets represent the real output, so they need to be of the same length.

Resources