Pytorch: Randomly subsample loss tensors using `torch.randperm` - pytorch

I'm trying to randomly subsample the prediction and target array for my loss calculation.
idx = torch.randperm(target.shape[0])
target = target.index_select(0, idx[0, sample_size]
However I'm getting this error message.
index_select(): argument 'index' (position 2) must be Variable, not torch.LongTensor
Does anyone know how to fix this?
Edit:
I got one step closer. It seems like torch.randperm does not return a torch variable, so one has to explicitly convert the output:
idx = torch.randperm(target.shape[0])
idx = Variable(idx).cuda()
target = target.index_select(0, idx[0, sample_size]
only problem is now that the backpropagation fails. Seems like the operation of randomly subsampling is causing an issue with the dimensions.
However the dimensions seem to be fine when calculating the loss:
loss = F.nll_loss(prediction, target.view(-1)) # prediction shape is [Nx12] and target shape is N
Unfortunately when calling loss.backward() I get this error message:
RuntimeError: The expanded size of the tensor (12) must match the existing size (217456) at non-singleton dimension 1

Related

Do gradient descent on function with no input using pytorch

What's the correct way to do gradient descent on an arbitrary function with no input using Pytorch?
x = torch.tensor(x_init, requires_grad=True)
opt = torch.optim.Adam([x])
cost_fnx = cost(x)
for iteration_count in range(100):
opt.zero_grad()
cost_fnx.backward()
opt.step()
When I tried the above, I got this error:
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.
The error occurs because you are trying to backpropagate on the same graph multiple times. You most likely need to recompute the cost value (your regularizer function since it only has the model's parameters as input) to backpropagate again. Something like:
x = x_init.requires_grad_(True)
opt = torch.optim.Adam([x])
for iteration_count in range(2):
cost_fnx = cost(x)
opt.zero_grad()
cost_fnx.backward()
opt.step()

Why torch.dot(a,b) makes requires_grad=False

I have some losses in a loop storing them in a tensor loss. Now I want to multiply a weight tensor to the loss tensor to have final loss, but after torch.dot(), the result scalar, ll_new, has requires_grad=False. The following is my code.
loss_vector = torch.FloatTensor(total_loss_q)
w_norm = F.softmax(loss_vector, dim=0)
ll_new = torch.dot(loss_vector,w_norm)
How can I have requires_grad=False for the ll_new after doing the above?
I think the issue is in the line: loss_vector = torch.FloatTensor(total_loss_q) as requires_grad for loss_vector is False (default value). So, you should do:
loss_vector = torch.FloatTensor(total_loss_q, requires_grad=True)
The issue most likely lies within this part:
I have some losses in a loop storing them in a tensor loss
You are most likely losing requires_grad somewhere in the process before torch.dot. E.g. if you use something like .item() on individual losses when constructing total_loss_q tensor.
What type is your total_loss_q? If it is a list of integers then there is no way your gradients will propagate through that. You need to construct total_loss_q in such a way that it is a tensor which knows how each individual loss was constructed (i.e. can propagate gradients to your trainable weights).

How to convert FloatTensor to ByteTensor with Pytorch?

I'm new to Pytorch and neural network programming but I've an issue I encountered and I'm not able to solve it on my own. My data are numpy arrays of 1 and 0. But when I try to train my net, I get this error :
RuntimeError: Expected object of type torch.ByteTensor but found type torch.FloatTensor for argument #2 'mat2'
the line where the error comes from is in the forward method of my net
x = self.fc1(x)
I've tried these to convert my tensors but I still get the error :
x = x.type('torch.ByteTensor')
and
x.byte()
x.byte() returns what you need, but it's not an "inplace" method. Try doing:
x = x.byte()

TensorFlow: Removing nans in accumulated gradients

For a function approximation problem I'm trying to accumulate gradients but I find that sometimes some of these gradients are nan(i.e. undefined) even though the loss is always real. I think this might be due to numerical instabilities and I'm basically looking for a simple method for removing the nans from the computed gradients.
Starting with the solution to this question I tried doing the following:
# Optimizer definition - nothing different from any classical example
opt = tf.train.AdamOptimizer()
## Retrieve all trainable variables you defined in your graph
tvs = tf.trainable_variables()
## Creation of a list of variables with the same shape as the trainable ones
# initialized with 0s
accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
## Calls the compute_gradients function of the optimizer to obtain... the list of gradients
gvs_ = opt.compute_gradients(rmse, tvs)
gvs =tf.where(tf.is_nan(gvs_), tf.zeros_like(gvs_), gvs_)
## Adds to each element from the list you initialized earlier with zeros its gradient (works because accum_vars and gvs are in the same order)
accum_ops = [accum_vars[i].assign_add(gv[0]) for i, gv in enumerate(gvs)]
## Define the training step (part with variable value update)
train_step = opt.apply_gradients([(accum_vars[i], gv[1]) for i, gv in enumerate(gvs)])
So basically, the key idea is this line:
gvs =tf.where(tf.is_nan(gvs_), tf.zeros_like(gvs_), gvs_)
But when I apply this idea I obtain the following error:
ValueError: Tried to convert 'x' to a tensor and failed. Error:
Dimension 1 in both shapes must be equal, but are 30 and 9. Shapes are
[2,30] and [2,9]. From merging shape 2 with other shapes. for
'IsNan/packed' (op: 'Pack') with input shapes: [2,9,30], [2,30,9],
[2,30], [2,9].
compute_gradients returns a list of tensors in your case. You may want to do:
gvs_ = [(tf.where(tf.is_nan(grad), tf.zeros_like(grad), grad), val) for grad,val in gvs_]

Graph building fails at tf.scatter_nd due to placeholder shape limitations

Using scatter_nd to project an attention distribution onto another distribution, essentially creating an distribution that references a vocabulary.
indices = tf.stack((batch_nums, encoder_batch), axis=2)
shape = [batch_size, vocab_size]
attn_dists_projected = [tf.scatter_nd(indices, copy_distribution, shape) for copy_distribution in attn_dists]
When attempting to run this with placeholders with largely undefined dimensions, I ran into the following error:
ValueError: The inner 0 dimensions of output.shape=[?,?] must match the inner 1
dimensions of updates.shape=[128,128,?]: Shapes must be equal rank, but are 0 and 1
for 'final_distribution/ScatterNd' (op: 'ScatterNd') with input shapes:
[128,?,2], [128,128,?], [2].
This in the context of seq2seq, so the model placeholders' shapes need to be partially undefined. Additionally, my data batches are not consistent in size, which necessitates variable batch sizes as well.

Resources