I want to implement the following distance loss function in pytorch. I was following this https://discuss.pytorch.org/t/custom-loss-functions/29387/4 thread from the pytorch forum
np.linalg.norm(output - target)
# where output.shape = [1, 2] and target.shape = [1, 2]
So I have implemented the loss function like this
def my_loss(output, target):
loss = torch.tensor(np.linalg.norm(output.detach().numpy() - target.detach().numpy()))
return loss
with this loss function, calling backwards gives runtime error
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
My entire code looks like this
model = nn.Linear(2, 2)
x = torch.randn(1, 2)
target = torch.randn(1, 2)
output = model(x)
loss = my_loss(output, target)
loss.backward() <----- Error here
print(model.weight.grad)
PS: I am aware of the pairwise loss of pytorch but due to some limitation of it, I have to implement it myself.
Following the pytorch source code I have tried the following,
class my_function(torch.nn.Module): # forgot to define backward()
def forward(self, output, target):
loss = torch.tensor(np.linalg.norm(output.detach().numpy() - target.detach().numpy()))
return loss
model = nn.Linear(2, 2)
x = torch.randn(1, 2)
target = torch.randn(1, 2)
output = model(x)
criterion = my_function()
loss = criterion(output, target)
loss.backward()
print(model.weight.grad)
And I get the Run time error
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
How can I implement the loss function correctly?
This happens because, in the loss function, you are detaching tensors. You had to detach because you wanted to use np.linalg.norm. This breaks the graph and you get the error that tensors don't have grad fn.
You can replace
loss = torch.tensor(np.linalg.norm(output.detach().numpy() - target.detach().numpy()))
by torch operations as
loss = torch.norm(output-target)
This should work fine.
Related
I built a very simple structure
class classifier (nn.Module):
def __init__(self):
super().__init__()
self.classify = nn.Sequential(
nn.Linear(166,80),
nn.Tanh(),
nn.Linear(80,40),
nn.Tanh(),
nn.Linear(40,1),
nn.Softmax()
)
def forward (self, x):
pred = self.classify(x)
return pred
model = classifier()
The loss function and optimizer are defined as
criteria = nn.BCEWithLogitsLoss()
iteration = 1000
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
and here is the training and evaluation section
for epoch in range (iteration):
model.train()
y_pred = model(x_train)
loss = criteria(y_pred,y_train)
optimizer.zero_grad()
loss.backward()
optimizer.step()
model.eval()
with torch.inference_mode():
test_pred = model(x_test)
test_loss = criteria(test_pred, y_test)
if epoch % 100 == 0:
print(loss)
print(test_loss)
I received the same loss values, and by debugging, I found that the weights were not being updated.
The problem is in the network architecture: you are using a Softmax layer on a single valued output at the end. As per the definition of the softmax function, for a output vector x, we have, for index i:
softmax(x_i) = e^{x_i} / sum_j (e^{x_j})
Here, you only have a single valued output. Due to this, the output of your neural network is always 1, irrespective of the inputs or the weights. To fix this, remove the Softmax layer at the end. An activation function like Sigmoid might be more appropriate, and in fact you are already applying this when using the BCEWithLogitsLoss.
The problem lies here
y_pred = model(x_train)
loss = criteria(y_pred,y_train)
optimizer.zero_grad()
loss.backward()
optimizer.step()
after loss is calculated, you are clearing the gradients by doing optimizer.zero_grad()
the ideal case should be:
optimizer.zero_grad()
y_pred = model(x_train)
loss = criteria(y_pred,y_train)
loss.backward()
optimizer.step()
I want to develop a lifelong learning system,so i need to prevent important parameter from changing.I read related paper 'Memory Aware Synapses: Learning what (not) to forget',a method was mentioned,I need to calculate the gradient of each parameter conresponding to each input image,so how should i write my code in pytorch?
'Memory Aware Synapses: Learning what (not) to forget'
You can do it using standard optimization procedure and .backward() method on your loss function.
First, scaling as defined in your link:
class Scaler:
def __init__(self, parameters, delta):
self.parameters = parameters
self.delta = delta
def step(self):
"""Multiplies gradients in place."""
for param in self.parameters:
if param.grad is None:
raise ValueError("backward() has to be called before running scaler")
param.grad *= self.delta
One can use it just like optimizer.step(), see below (see comments):
model = torch.nn.Sequential(
torch.nn.Linear(10, 100), torch.nn.ReLU(), torch.nn.Linear(100, 1)
)
scaler = Scaler(model.parameters(), delta=0.001)
optimizer = torch.optim.Adam(model.parameters())
criterion = torch.nn.MSELoss()
X, y = torch.randn(64, 10), torch.randn(64)
# Optimization loop
EPOCHS = 10
for _ in range(EPOCHS):
output = model(X)
loss = criterion(output, y)
loss.backward() # Now model has the gradients
optimizer.step() # Optimize model's parameters
print(next(model.parameters()).grad)
scaler.step() # Scaler gradients
optimizer.zero_grad() # Zero gradient before next step
After scaler.step() you will have gradient scaled available inside param.grad for each parameter (just like those are accessed within Scaler's step method) so you can do whatever you want with them.
I have defined the following custom model and training loop in Keras:
class CustomModel(keras.Model):
def train_step(self, data):
x, y = data
with tf.GradientTape() as tape:
y_pred = self(x, training=True) # Forward pass
loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)
trainable_vars = self.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
self.optimizer.apply_gradients(zip(gradients, trainable_vars))
self.compiled_metrics.update_state(y, y_pred)
return {m.name: m.result() for m in self.metrics}
And I am using the following code to train the model on a simple toy data set:
inputs = keras.layers.Input(shape=(1,))
hidden = keras.layers.Dense(1, activation='tanh')(inputs)
outputs = keras.layers.Dense(1)(hidden)
x = np.arange(0, 2*np.pi, 2*np.pi/100)
y = np.sin(x)
nnmodel = CustomModel(inputs, outputs)
nnmodel.compile(optimizer=keras.optimizers.SGD(lr=0.1), loss="mse", metrics=["mae"])
nnmodel.fit(x, y, batch_size=100, epochs=2000)
I want to be able to see the values of the gradient and the trainable_vars variables in the train_step function for each training loop, and I am not sure how to do this.
I have tried to set a break point inside the train_step function in my python IDE and expecting it to stop at the break point for each epoch of the training after I call model.fit() but this didn't happen. I also tried to have them print out the values in the log after each epoch but I am not sure how to achieve this.
Suppose a model as in:
model = Model(inputs=[A, B], outputs=C)
With custom loss:
def actor_loss(y_true, y_pred):
log_lik = y_true * K.log(y_pred)
loss = -K.sum(log_lik * K.stop_gradient(B))
return loss
Now I'm trying to define a function that returns the gradients of the loss wrt to the weights for a given pair of input and target output and expose it as such.
Here is an idea of what I mean in pseudocode
def _get_grads(inputs, targets):
loss = model.loss(targets, model.output)
weights = model.trainable_weights
grads = K.gradients(loss, weights)
model.input[0] (aka 'A') <----inputs[0]
model.input[1] (aka 'B') <----inputs[1]
return K.function(model.input, grads)
self.get_grads = _get_grads
My question is how do I feed inputs argument to the graph inside said function.
(So far I've only worked with .fit and not with .gradients and I can't find any decent documentation with custom loss or multiple inputs)
If you call K.function, you get an actual callable function, so you should just call it with some parameter values. The format is exactly the same as model.fit, in your case it should be two arrays of values, including the batch dimension:
self.get_grads = _get_grads(inputs, targets)
grad_value = self.get_grads([input1, input2])
Where input1 and input2 are numpy arrays that include the batch dimension.
My understanding of K.function ,K.gradients and custom loss was fundamentally wrong. You use the function to construct a mini-graph that computes gradients of loss wrt to weights. No need for the function itself to have arguments.
def _get_grads():
targets = Input(shape=...)
loss = model.loss(targets, model.output)
weights = model.trainable_weights
grads = K.gradients(loss, weights)
return K.function(model.input + [targets], grads)
I was under the impression that _get_grads was itself K.function but that was wrong. _get_grads() returns K.function. And then you use that as
f = _get_grads() # constructs the mini-graph that gives gradients
grads = f([inputs, labels])
inputs is fed to model.inputs, labels to targets and it returns grads.
I am trying to create the custom loss function using Keras. I want to compute the loss function based on the input and predicted the output of the neural network.
I tried using the customloss function in Keras. I think y_true is the output that we give for training and y_pred is the predicted output of the neural network. The below loss function is same as "mean_squared_error" loss in Keras.
def customloss(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
I would like to use the input to the neural network also to compute the custom loss function in addition to mean_squared_error loss. Is there a way to send an input to the neural network as an argument to the customloss function.
Thank you.
I have come across 2 solutions to the question you asked.
You can pass your input (scalar only) as an argument to the custom loss wrapper function.
def custom_loss(i):
def loss(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1) + something with i...
return loss
def baseline_model():
# create model
i = Input(shape=(5,))
x = Dense(5, kernel_initializer='glorot_uniform', activation='linear')(i)
o = Dense(1, kernel_initializer='normal', activation='linear')(x)
model = Model(i, o)
model.compile(loss=custom_loss(i), optimizer=Adam(lr=0.0005))
return model
This solution is also mentioned in the accepted answer here
You can pad your label with extra data columns from input and write a custom loss. This is helpful if you just want one/few feature column(s) from your input.
def custom_loss(data, y_pred):
y_true = data[:, 0]
i = data[:, 1]
return K.mean(K.square(y_pred - y_true), axis=-1) + something with i...
def baseline_model():
# create model
i = Input(shape=(5,))
x = Dense(5, kernel_initializer='glorot_uniform', activation='linear')(i)
o = Dense(1, kernel_initializer='normal', activation='linear')(x)
model = Model(i, o)
model.compile(loss=custom_loss, optimizer=Adam(lr=0.0005))
return model
model.fit(X, np.append(Y_true, X[:, 0], axis =1), batch_size = batch_size, epochs=90, shuffle=True, verbose=1)
This solution can be found also here in this thread.
I have only used the 2nd method when I had to use input feature columns in the loss. The first method can be only used with scalar arguments as mentioned in the comments.
You could wrap your custom loss with another function that takes the input tensor as an argument:
def customloss(x):
def loss(y_true, y_pred):
# Use x here as you wish
err = K.mean(K.square(y_pred - y_true), axis=-1)
return err
return loss
And then compile your model as follows:
model.compile('sgd', customloss(x))
where x is your input tensor.
NOTE: Not tested.