How to train a CNN model? - conv-neural-network

When trying to train the CNN model, I came across a code shown below:
def train(n_epochs, loaders, model, optimizer, criterion):
for epoch in range(1,n_epochs):
train_loss = 0
valid_loss = 0
model.train()
for i, (data,target) in enumerate(loaders['train']):
# zero the parameter (weight) gradients
optimizer.zero_grad()
# forward pass to get outputs
output = model(data)
# calculate the loss
loss = criterion(output, target)
# backward pass to calculate the parameter gradients
loss.backward()
# update the parameters
optimizer.step()
Can someone please tell me why is the second for loop used?
i.e; for i, (data,target) in enumerate(loaders['train']):
And why optimizer.zero_grad() and optimizer.step() is used?

torch.utils.data.DataLoader comes in handy when you need to prepare data batches (and perhaps shuffle them before every run).
data_train_loader = DataLoader(data_train, batch_size=64, shuffle=True)
In the above code, first for-loop iterates through the number of epochs while second loop iterates through the training dataset converted into batches via above code. For example:
for batch_idx, samples in enumerate(data_train_loader):
# samples will be a 64 x D dimensional tensor
# batch_idx is each batch index
Learn more about torch.utils.data.DataLoader from here.
Optimizer.zero_gradient(): Before the backward pass, use the optimizer object to zero all of the gradients for the tensors it will update (which are the learnable weights of the model)
optimizer.step(): We generally use optimizer.step() to make the gradient descent step. Calling the step function on an Optimizer makes an update to its parameters.
Learn more about these from here.

Optimizer is used first to load the params like this (missing in your code):
optimizer = optim.Adam(model.parameters(), lr=0.001, momentum=0.9)
This code
loss = criterion(output, target)
Is used to calculate the loss of a single batch where targets is what you got from a tuple (data,target) and data is used as the input for the model, where we got the output.
This step:
optimizer.zero_grad()
Will zero all the gradients found in the optimizer, which is very important on initialization.
The part
loss.backward()
Calculates the gradients, and the optimizer.step() updates our model weights and biases (parameters).
In PyTorch you typically use DataLoader class to load the trainging and validation sets.
loaders['train']
Is probable the full train set, which represents a single epoch.

Related

Pytorch - Repeating Loss

I am new to PyTorch and I found a problem when displaying the loss of my model.
Pytorch Adam Optimizer - Model Loss Figure
Pytorch SGD Optimizer - Model Loss Figure
As you can see, the model seem to go up and down multiple times, with a recurrent pattern (the pattern starting to repeat at the begging of every epoch).
The full code can be found at: https://github.com/19valentin99/Kaggle/tree/main/Iris%20Flowers
in main_test.py (the # lines are the ones that I used to debug the code and the answer should be below).
When we just take the loss of the last element (or the loss over the
whole epoch) we will see a smooth decrease in loss
The reason your loss is smooth is because you are looking at the loss of the exact same batch on every iteration. Indeed your train data loader isn't shuffling your instance:
train2 = DataLoader(flowers_data_train, batch_size=BATCH_SIZE)
This means the same batch will appear last on every epoch. That's all there is to it, this doesn't mean the learning is different, it means you are looking at a part of the complete dataset loss.
The difference between "not working" and "working" is based of when the loss is recorded.
The idea is that: overall, the loss converges, but in this time until it converges it jumps up and down.
While it jumps up and down, we might see a pattern if we are sampling too often. The pattern is given by the data we use for training (as the data we use to train is the same every epoch - in batches).
As a result:
For the not-working version: I was recording the loss every epoch, after every batch.
For the working version: I was recording only the latest loss in the epoch.
Pytorch Adam Optimizer - Model Loss (working)
Pytorch SGD Optimizer - Model Loss (working)
Furthermore, I will attach the code which generates the non working version:
loss_list = []
for epoch in range(EPOCHS):
for idx, (x, y) in enumerate(train_load):
x, y = x.to(device), y.to(device)
#Compute Error
prediction = model(x)
#print(prediction, y)
loss = loss_fn(prediction, y)
#debuging
loss_list.append(loss.item())
##Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
plt.plot(loss_list)
plt.show()
The working code:
loss_list2 = np.zeros((EPOCHS,))
for epoch in range(EPOCHS):
for batch, (x, y) in enumerate(train_load):
x = x.to(device=device)
y = y.to(device=device)
y_pred = model(x)
loss = loss_fn(y_pred, y)
loss_list2[epoch] = loss.item()
# Zero gradients
optimizer.zero_grad()
loss.backward()
optimizer.step()
plt.plot(loss_list2)
plt.show()
In the end, I would like to mention that I know that there are a couple of other threads out there that say how to solve this problem (like: clip the gradients, remove the last batch, model is too simple to capture the data) but in the end, what I discovered is that it wasn't actually a problem but more "when the recording of the data is done".
I hope that this will help other people as well.

loss.backward() with minibatch in pytorch

I came across this code online and I was wondering if I interpreted it correctly. Below is a part of a gradient descent process. full code available through the link https://jovian.ml/aakashns/03-logistic-regression. My question is as followed: During the training step, I guess the author is trying to minimize the loss for each batch by updating the parameters. However, how can we be sure the total loss of all training samples is minimized if loss.backward() is only applied to the batch loss?
def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):
history = []
optimizer = opt_func(model.parameters(), lr)
for epoch in range(epochs):
# Training Phase
for batch in train_loader:
loss = model.training_step(batch)
loss.backward()
optimizer.step()
optimizer.zero_grad()
# Validation phase
result = evaluate(model, val_loader)
model.epoch_end(epoch, result)
history.append(result)
return history

Pytorch: Custom Loss involving Norm of End-to-End Jacobian

Cross posting from Pytorch discussion boards
I want to train a network using a modified loss function that has both a typical classification loss (e.g. nn.CrossEntropyLoss) as well as a penalty on the Frobenius norm of the end-to-end Jacobian (i.e. if f(x) is the output of the network, \nabla_x f(x)).
I’ve implemented a model that can successfully learn using nn.CrossEntropyLoss. However, when I try adding the second loss function (by doing two backwards passes), my training loop runs, but the model never learns. Furthermore, if I calculate the end-to-end Jacobian, but don’t include it in the loss function, the model also never learns. At a high level, my code does the following:
Forward pass to get predicted classes, yhat, from inputs x
Call yhat.backward(torch.ones(appropriate shape), retain_graph=True)
Jacobian norm = x.grad.data.norm(2)
Set loss equal to classification loss + scalar coefficient * jacobian norm
Run loss.backward()
I suspect that I’m misunderstanding how backward() works when run twice, but I haven’t been able to find any good resources to clarify this.
Too much is required to produce a working example, so I’ve tried to extract the relevant code:
def train_model(model, train_dataloader, optimizer, loss_fn, device=None):
if device is None:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.train()
train_loss = 0
correct = 0
for batch_idx, (batch_input, batch_target) in enumerate(train_dataloader):
batch_input, batch_target = batch_input.to(device), batch_target.to(device)
optimizer.zero_grad()
batch_input.requires_grad_(True)
model_batch_output = model(batch_input)
loss = loss_fn(model_output=model_batch_output, model_input=batch_input, model=model, target=batch_target)
train_loss += loss.item() # sum up batch loss
loss.backward()
optimizer.step()
and
def end_to_end_jacobian_loss(model_output, model_input):
model_output.backward(
torch.ones(*model_output.shape),
retain_graph=True)
jacobian = model_input.grad.data
jacobian_norm = jacobian.norm(2)
return jacobian_norm
Edit 1: I swapped my previous implementation with .backward() to autograd.grad and it apparently works! What's the difference?
def end_to_end_jacobian_loss(model_output, model_input):
jacobian = autograd.grad(
outputs=model_output['penultimate_layer'],
inputs=model_input,
grad_outputs=torch.ones(*model_output['penultimate_layer'].shape),
retain_graph=True,
only_inputs=True)[0]
jacobian_norm = jacobian.norm(2)
return jacobian_norm

PyTroch, Gradient calculations

https://colab.research.google.com/github/pytorch/tutorials/blob/gh-pages/_downloads/neural_networks_tutorial.ipynb
Hi I am trying to understand the NN with pytorch.
I have doubts in gradient calculations..
import torch.optim as optim
create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
```
# in your training loop:
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() # Does the update
```
From the about code, I understood loss.backward() calculates the gradients.
I am not sure, how these info shared with optimizer to update the gradient.
Can anyone explain this..
Thanks in advance !
When you created the optimizer in this line
optimizer = optim.SGD(net.parameters(), lr=0.01)
You provided net.parameters() with all learnable parameters that will be updated, based on gradients.
The model and the optimizer are connected only because they share the same parameters.
PyTorch parameters are tensors. They are not called variables anymore.

How to deal with mini-batch loss in Pytorch?

I feed mini-batch data to model, and I just want to know how to deal with the loss. Could I accumulate the loss, then call the backward like:
...
def neg_log_likelihood(self, sentences, tags, length):
self.batch_size = sentences.size(0)
logits = self.__get_lstm_features(sentences, length)
real_path_score = torch.zeros(1)
total_score = torch.zeros(1)
if USE_GPU:
real_path_score = real_path_score.cuda()
total_score = total_score.cuda()
for logit, tag, leng in zip(logits, tags, length):
logit = logit[:leng]
tag = tag[:leng]
real_path_score += self.real_path_score(logit, tag)
total_score += self.total_score(logit, tag)
return total_score - real_path_score
...
loss = model.neg_log_likelihood(sentences, tags, length)
loss.backward()
optimizer.step()
I wonder that if the accumulation could lead to gradient explosion?
So, should I call the backward in loop:
for sentence, tag , leng in zip(sentences, tags, length):
loss = model.neg_log_likelihood(sentence, tag, leng)
loss.backward()
optimizer.step()
Or, use the mean loss just like the reduce_mean in tensorflow
loss = reduce_mean(losses)
loss.backward()
The loss has to be reduced by mean using the mini-batch size. If you look at the native PyTorch loss functions such as CrossEntropyLoss, there is a separate parameter reduction just for this and the default behaviour is to do mean on the mini-batch size.
We usually
get the loss by the loss function
(if necessary) manipulate the loss, for example do the class weighting and etc
calculate the mean loss of the mini-batch
calculate the gradients by the loss.backward()
(if necessary) manipulate the gradients, for example, do the gradient clipping for some RNN models to avoid gradient explosion
update the weights using the optimizer.step() function
So in your case, you can first get the mean loss of the mini-batch and then calculate the gradient using the loss.backward() function and then utilize the optimizer.step() function for the weight updating.

Resources