How to use the lbfgs optimizer with pytorch-lightning? - pytorch

I have a problem in using the LBFGS optimizer from pytorch with lightning.
I use the template from here to start a new project and here is the code that I tried (only the training portion):
def training_step(self, batch, batch_nb):
x, y = batch
x = x.float()
y = y.float()
y_hat = self.forward(x)
return {'loss': F.mse_loss(y_hat, y)}
def configure_optimizers(self):
optimizer = torch.optim.LBFGS(self.parameters())
return optimizer
def optimizer_step(self, epoch_nb, batch_nb, optimizer, optimizer_i):
def closure():
optimizer.zero_grad()
l = self.training_step(batch, batch_nb)
loss = l['loss']
loss.backward()
return loss
optimizer.step(closure)
The LBFGS optimizer from pytorch requires a closure function (see here and here), but I don't know how to define it inside the template, specially I don't know how the batch data is passed to the optimizer. I tried to define a custom optimizer_step function but I have some problems to passing the batch inside the closure function.
I will be very thankful of any advise that helps me to solve this problem or points me to the right direction.
Environment:
PyTorch version: 1.2.0+cpu
Lightning version: 0.4.9
Test-tube version: 0.7.1

The support for lbfgs optimizer was added on #310, now it is not required to define a closure function.

Related

Cross Entropy for Soft Labeling in Pytorch

i'm trying to define the loss function of a two-class classification problem. However, the target label is not hard label 0,1, but a float number between 0~1.
torch.nn.CrossEntropy in Pytorch do not support soft label so i'm trying to write a cross entropy function by my self.
My function looks like this
def cross_entropy(self, pred, target):
loss = -torch.mean(torch.sum(target.flatten() * torch.log(pred.flatten())))
return loss
def step(self, batch: Any):
x, y = batch
logits = self.forward(x)
loss = self.criterion(logits, y)
preds = logits
# torch.argmax(logits, dim=1)
return loss, preds, y
however it does not work at all.
Can anyone give me a suggestion is there any mistake in my loss function?
It seems like BCELoss and the robust version BCEWithLogitsLoss are working with fuzzy targets "out of the box". They do not expect target to be binary" any number between zero and one is fine.
Please read the doc.

How to compute the parameter importance in pytorch?

I want to develop a lifelong learning system,so i need to prevent important parameter from changing.I read related paper 'Memory Aware Synapses: Learning what (not) to forget',a method was mentioned,I need to calculate the gradient of each parameter conresponding to each input image,so how should i write my code in pytorch?
'Memory Aware Synapses: Learning what (not) to forget'
You can do it using standard optimization procedure and .backward() method on your loss function.
First, scaling as defined in your link:
class Scaler:
def __init__(self, parameters, delta):
self.parameters = parameters
self.delta = delta
def step(self):
"""Multiplies gradients in place."""
for param in self.parameters:
if param.grad is None:
raise ValueError("backward() has to be called before running scaler")
param.grad *= self.delta
One can use it just like optimizer.step(), see below (see comments):
model = torch.nn.Sequential(
torch.nn.Linear(10, 100), torch.nn.ReLU(), torch.nn.Linear(100, 1)
)
scaler = Scaler(model.parameters(), delta=0.001)
optimizer = torch.optim.Adam(model.parameters())
criterion = torch.nn.MSELoss()
X, y = torch.randn(64, 10), torch.randn(64)
# Optimization loop
EPOCHS = 10
for _ in range(EPOCHS):
output = model(X)
loss = criterion(output, y)
loss.backward() # Now model has the gradients
optimizer.step() # Optimize model's parameters
print(next(model.parameters()).grad)
scaler.step() # Scaler gradients
optimizer.zero_grad() # Zero gradient before next step
After scaler.step() you will have gradient scaled available inside param.grad for each parameter (just like those are accessed within Scaler's step method) so you can do whatever you want with them.

Custom adaptive loss function with additional dynamic argument in Keras

I have to use an adaptive custom loss function that takes an additional dynamic argument (eps) in keras. The argument eps is a scalar but changes from one sample to the other : the loss function should be therefore adapted during training. I use a generator and I can pass this argument through every call of the generator during training (generator_train[2]). Based on answers to similar questions I tried to write the following wrapping:
def custom_loss(eps):
def square_err(y_true, y_pred):
nom = K.sum(K.square(y_pred - y_true), axis=-1)
denom = eps**2
loss = nom/denom
return loss
return square_err
But I am struggling with implementing it since eps is a dynamic variable: I don't know how I should pass this argument to the loss function during training (model.fit). Here is a simple version of my model:
model = keras.Sequential()
model.add(layers.LSTM(units=32, input_shape=(32, 4))
model.add(layers.Dense(units=1))
model.add_loss(custom_loss)
opt = keras.optimizers.Adam()
model.compile(optimizer=opt)
history = model.fit(x=generator_train[0], y=generator_train[1],
steps_per_epoch=100
epochs=50,
validation_data=gen_vl,
validation_steps=n_vl)
Your help would be very appreciated.
Simply pass "sample weights", which will be 1/(eps**2) for each sample.
Your generator should just output x, y, sample_weights and that's all.
Your loss can be:
def loss(y_true, y_pred):
return K.sum(K.square(y_pred - y_true), axis=-1)
In fit, you cannot use indexing in the generator, you will pass just generator_train, no x, no y, just generator_train.

How can I use the LBFGS optimizer with pytorch ignite?

I started using Ignite recently and i found it very interesting.
I would like to train a model using as an optimizer the LBFGS algorithm from the torch.optim module.
This is my code:
from ignite.engine import Events, Engine, create_supervised_trainer, create_supervised_evaluator
from ignite.metrics import RootMeanSquaredError, Loss
from ignite.handlers import EarlyStopping
D_in, H, D_out = 5, 10, 1
model = simpleNN(D_in, H, D_out) # a simple MLP with 1 Hidden Layer
model.double()
train_loader, val_loader = get_data_loaders(i)
optimizer = torch.optim.LBFGS(model.parameters(), lr=1)
loss_func = torch.nn.MSELoss()
#Ignite
trainer = create_supervised_trainer(model, optimizer, loss_func)
evaluator = create_supervised_evaluator(model, metrics={'RMSE': RootMeanSquaredError(),'LOSS': Loss(loss_func)})
#trainer.on(Events.ITERATION_COMPLETED)
def log_training_loss(engine):
print("Epoch[{}] Loss: {:.5f}".format(engine.state.epoch, len(train_loader), engine.state.output))
def score_function(engine):
val_loss = engine.state.metrics['RMSE']
print("VAL_LOSS: {:.5f}".format(val_loss))
return -val_loss
handler = EarlyStopping(patience=10, score_function=score_function, trainer=trainer)
evaluator.add_event_handler(Events.COMPLETED, handler)
trainer.run(train_loader, max_epochs=100)
And the error that raises is:
TypeError: step() missing 1 required positional argument: 'closure'
I know that is required to define a closure for the implementation of LBFGS, so my question is how can I do it using ignite? or is there another approach for doing this?
The way to do it is like this:
from ignite.engine import Engine
model = ...
optimizer = torch.optim.LBFGS(model.parameters(), lr=1)
criterion =
def update_fn(engine, batch):
model.train()
x, y = batch
# pass to device if needed as here: https://github.com/pytorch/ignite/blob/40d815930d7801b21acfecfa21cd2641a5a50249/ignite/engine/__init__.py#L45
def closure():
y_pred = model(x)
loss = criterion(y_pred, y)
optimizer.zero_grad()
loss.backward()
return loss
optimizer.step(closure)
trainer = Engine(update_fn)
# everything else is the same
Source
You need to encapsulate all evaluating step with zero_grad and returning step in
for batch in loader():
def closure():
...
return loss
optim.step(closure)
Pytorch docs for 'closure'

Pytorch: Why loss functions are implemented both in nn.modules.loss and nn.functional module?

Many loss functions in Pytorch are implemented both in nn.modules.loss and nn.functional.
For example, the two lines of the below return same results.
import torch.nn as nn
import torch.functional as F
nn.L1Loss()(x,y)
F.l1_loss(x,y)
Why are there two implementations?
Consistency for other parametric loss functions
Instantiation of loss function brings something good
otherwise
I think of it as of a partial application situation - it's useful to be able to "bundle" many of the configuration variables with the loss function object. In most cases, your loss function has to take prediction and ground_truth as its arguments. This makes for a fairly uniform basic API of loss functions. However, they differ in details. For instance, not every loss function has a reduction parameter. BCEWithLogitsLoss has weight and pos_weight parameters; PoissonNLLLoss has log_input, eps. It's handy to write a function like
def one_epoch(model, dataset, loss_fn, optimizer):
for x, y in dataset:
model.zero_grad()
y_pred = model(x)
loss = loss_fn(y_pred, y)
loss.backward()
optimizer.step()
which can work with instantiated BCEWithLogitsLoss equally well as with PoissonNLLLoss. But it cannot work with their functional counterparts, because of the bookkeeping necessary. You would instead have to first create
loss_fn_packed = functools.partial(F.binary_cross_entropy_with_logits, weight=my_weight, reduction='sum')
and only then you can use it with one_epoch defined above. But this packing is already provided with the object-oriented loss API, along with some bells and whistles (since losses subclass nn.Module, you can use forward and backward hooks, move stuff between cpu and gpu, etc).
There is the code of BCEWithLogistsLoss without doc:
class BCEWithLogitsLoss(_Loss):
def __init__(self, weight: Optional[Tensor] = None, size_average=None, reduce=None, reduction: str = 'mean',
pos_weight: Optional[Tensor] = None) -> None:
super(BCEWithLogitsLoss, self).__init__(size_average, reduce, reduction)
self.register_buffer('weight', weight)
self.register_buffer('pos_weight', pos_weight)
def forward(self, input: Tensor, target: Tensor) -> Tensor:
return F.binary_cross_entropy_with_logits(input, target,
self.weight,
pos_weight=self.pos_weight,
reduction=self.reduction)
If parameter passing isn't considered, class and function implementation are same exactly. However, use class implementation can keep your code more concise and readable, e.g.
use function
loss_func=binary_cross_entropy_with_logits
def train(model, dataloader, loss_fn, optimizer, weight, size_average, reduce, reduction, pos_weight):
for x, y in dataloader:
model.zero_grad()
y_pred = model(x)
loss = loss_fn(y_pred, y, weight, size_average, reduce, reduction, pos_weight)
loss.backward()
optimizer.step()
use class
loss_func = BCEWithLogitsLoss(weight, size_average, reduce, reduction, pos_weight)
def train(model, dataloader, loss_fn, optimizer):
for x, y in dataloader:
model.zero_grad()
y_pred = model(x)
loss = loss_fn(y_pred, y)
loss.backward()
optimizer.step()
If you have several parameters or different loss functions, the class implementation is better.

Resources