I want to calculate the gradient of a tensor and however, it gives error as
RunTimeerror: grad can be implicitly created only for scalar outputs
and here is what I am trying to code:
x = torch.full((2,3), 4,requires_grad=True)
y = (2*x**2+3)
y.backward()
And at this point, it throws an error.
Since there is no summing up/reducing the loss-value , like .sum()
Hence the issue could be fixed by:
y.backward(torch.ones_like(x))
which performs a Jacobian-vector product with a tensor of all ones and get the gradient.
Related
I'm a beginner in pytorch and I'm probably stuck on a relatively trivial problem, but it's not clearing up for me at the moment.
When calculating the gradient of a tensor I get a constant gradient of 1.
Shouldn't the gradient of a constant however result in 0?
Here is a minimal example:
import torch
x = torch.tensor(50., requires_grad=True)
y = x
y.backward()
print(x.grad)
#Ouput: tensor(1.)
So why is the ouput 1 and not 0?
You are not computing the gradient of a constant, but that of the variable x which has a constant value 50. The derivative of x with respect to x is 1.
I have a 3D tensor of size say 100x5x2 and mean of the tensor across axis=1 which gives shape 100x2.
100 here is the batch size. Normally without batch, the division of tensor of shape 5x2 and 2 works perfectly but in the case of the 3D tensor with batch, I’m receiving error.
a = torch.rand(5,2)
b = torch.rand(2)
z=a/b
gives me expected answer.
a = torch.rand(100,5,2)
b = torch.rand(100,2)
z=a/b
Gives me the following error.
The size of tensor a (5) must match the size of tensor b (100) at non-singleton dimension 1.
How to divide these tensors such that my output is of shape 100x5x2 ? Something like bmm for division?
Simply do:
z = a / b.unsqueeze(1)
This adds an extra dimension in b and makes it of shape (100, 1, 2) which is compatible for broadcasting with a.
I apologize that this is probably a simple question that has been answered before, but I could not find the answer. I’m attempting to use a CNN to extract features and then input that into a FC network that outputs 2 variables. I’m attempting to use the functional linear layer as a way to dynamically handle the flattened features. The self.cnn is a Sequential container which last layer is the nn.Flatten(). When I print the size of x after the CNN I see it is 15x152064, so I’m unclear why the F.linear layer is failing to run with the error below. Any help would be appreciated.
RuntimeError: size mismatch, get 15, 15x152064,2
x = self.cnn(x)
batch_size, channels = x.size()
x = F.linear(x, torch.Tensor([256,channels]))
y_hat = self.FC(x)
torch.Tensor([256, channels]) does not create a tensor of size (256, channels) but the 1D tensor containing the values 256 and channels instead. I don't know how you want to initialize your weights, but there are a couple options :
# Identity transform:
x = F.linear(x, torch.ones(256,channels))
# Random transform :
x = F.linear(x, torch.randn(256,channels))
I am trying to multiply each layer of a tensor with the first layer of the tensor.
x1 = bert_model_1([x1_in, x2_in])
x1_begin = Lambda(lambda x: x[:,0])(x1) #obtain the first layer of the bert tensor
x1_begin = Lambda(keras.layers.multiply(x11, x1_begin) for x11 in x1)([x1, x1_begin])
when i ran the code above, i keep getting the following errors,
<generator object build_corrector. . at 0x00000249DB234C48> is not a callable object.
the error seems to happen in the last line, how do i iterate each layer in the tensor?
In this blog post, he implements the triple loss outside the Kears layers. He gets the anchor_out, pos_out and neg_out from the network and then passes them to the triplet_loss() function he defined.
I wonder if I can calculate the triplet_loss within the Keras layers by defining my own Lambda layers.
Here's my network design:
margin=1
anchor_input = Input((600, ), name='anchor')
positive_input = Input((600, ), name='positive_input')
negative_input = Input((600, ), name='negative_input')
# Shared embedding layer for positive and negative items
Shared_DNN = Dense(300)
encoded_anchor = Shared_DNN(anchor_input)
encoded_positive = Shared_DNN(positive_input)
encoded_negative = Shared_DNN(negative_input)
DAP = Lambda(lambda tensors:K.sum(K.square(tensors[0] - tensors[1]),axis=1,keepdims=True),name='DAP_loss') #Distance for Anchor-Positive pair
DAN = Lambda(lambda tensors:K.sum(K.square(tensors[0] - tensors[1]),axis=1,keepdims=True),name='DAN_loss') #Distance for Anchor-Negative pair
Triplet_loss = Lambda(lambda loss:K.max([(loss[0] - loss[1] + margin),0],axis=0),name='Triplet_loss') #Distance for Anchor-Negative pair
DAP_loss = DAP([encoded_anchor,encoded_positive])
DAN_loss = DAN([encoded_anchor,encoded_negative])
#call this layer on list of two input tensors.
Final_loss = Triplet_loss([DAP_loss,DAN_loss])
model = Model(inputs=[anchor_input,positive_input, negative_input], outputs=Final_loss)
However, it gives me the error:
Tried to convert 'input' to a tensor and failed. Error: Shapes must be equal rank, but are 2 and 0
From merging shape 0 with other shapes. for 'Triplet_loss_4/Max/packed' (op: 'Pack') with input shapes: [?,1], []
The error is from the Triplet_loss layer. In the K.max() function, the first number loss[0] - loss[1] + margin has the shape (None,1). Yet the second number 0 has the shape (1). The two number are not of the same shape and therefore the K.max() function gives out an error.
My problem is, how to solve this error?
I have tried replacing the 0 with K.constant(0,shape=(1,)) and K.constant(0,shape=(None,1)), but they doesn't work.
Does this work?
Triplet_loss = Lambda(lambda loss: K.maximum(loss[0] - loss[1] + margin, 0.0),
name='Triplet_loss')
I think the issue with this line
Triplet_loss = Lambda(lambda loss:K.max([(loss[0] - loss[1] + margin), 0],
axis=0),name='Triplet_loss')
is that you are putting loss[0]-loss[1]+margin tensor and 0 in the list bracket, which keras interprets as concatenating two tensors. This fails due to the size mismatch; 0 is a scalar and has rank 0, while the first one is 2d array. This is what the error means.
To compare a tensor against a single value element-wise, use K.maximum, which broadcasts automatically when one of the arguments is a scalar.