PyTroch, Gradient calculations - pytorch
Hi I am trying to understand the NN with pytorch.
I have doubts in gradient calculations..
import torch.optim as optim
create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
# in your training loop:
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
optimizer.step() # Does the update
From the about code, I understood loss.backward() calculates the gradients.
I am not sure, how these info shared with optimizer to update the gradient.
Can anyone explain this..
Thanks in advance !

When you created the optimizer in this line
optimizer = optim.SGD(net.parameters(), lr=0.01)
You provided net.parameters() with all learnable parameters that will be updated, based on gradients.
The model and the optimizer are connected only because they share the same parameters.
PyTorch parameters are tensors. They are not called variables anymore.


Why is testing accuracy so low, could there be a bug in my code?

I've been training an image classification model using object detection and then applying image classification to the images. I have 87 custom classes in my data(not ImageNet classes), and just over 7000 images altogether(around 60 images per class). I am happy with my object detection code and I think it works quite well, however, for classification I have been using ResNet and AlexNet. I have tried AlexNet, ResNet18, ResNet50 and ResNet101 for training however, I am getting very low testing accuracies(around 10%), and my training accuracies are high for all models. I've also attempted regularisation and changing the learning rates, but I am not getting the higher accuracies(>80%) that I require. I wonder if there is a bug in my code, although I haven't been able to figure it out.
Here is my training code, I have also processed images in the way that Pytorch pretrained models expect:
import torch.nn as nn
import torch.optim as optim
from typing import Callable
import numpy as np
resnet = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50')
resnet.fc = nn.Linear(2048, 87)
res_loss = nn.CrossEntropyLoss()
res_optimiser = optim.SGD(resnet.parameters(), lr=0.01, momentum=0.9, weight_decay=1e-5)
def train_model(model, loss_fn, optimiser, modelsavepath):
train_acc = 0
for j in range(EPOCHS):
running_loss = 0.0
correct = 0
total = 0
for i, data in enumerate(training_generator, 0):
inputs, labels, paths = data
total += 1
outputs = model(inputs)
_, predicted = torch.max(outputs, 1)
if( ==
correct += 1
loss = loss_fn(outputs, labels)
running_loss += loss.item()
train_acc = train_correct / len(training_generator)
print("Epoch:{}/{} AVG Training Loss:{:.3f} AVG Training Acc {:.2f}% ".format(j + 1, EPOCHS, train_loss, train_acc)), modelsavepath)
train_model(resnet, res_loss, res_optimiser, 'resnet.pth')
Here is the testing code used for a single image, it is part of a class:
outputs = self.model(img[None, ...]) #models expect batches, so give it a singleton batch
scores, predictions = torch.max(outputs, 1)
predictions = predictions.numpy()[0]
possible_scores= np.argmax(scores.detach().numpy())
Is there a bug in my code, either testing or training, or is my model just overfitting? Additionally, is there a better image classification model that I could try?
Your dataset is very small, so you're most likely overfitting. Try:
decrease learning rate (try 0.001, 0.0001, 0.00001)
increase weight_decay (try 1e-4, 1e-3, 1e-2)
if you don't already, use image augmentations (at least the default ones, like random crop and flip).
Watch train/test loss curves when finetuning your model and stop training as soon as you see test accuracy going down while train accuracy goes up.

loss.backward() with minibatch in pytorch

I came across this code online and I was wondering if I interpreted it correctly. Below is a part of a gradient descent process. full code available through the link My question is as followed: During the training step, I guess the author is trying to minimize the loss for each batch by updating the parameters. However, how can we be sure the total loss of all training samples is minimized if loss.backward() is only applied to the batch loss?
def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):
history = []
optimizer = opt_func(model.parameters(), lr)
for epoch in range(epochs):
# Training Phase
for batch in train_loader:
loss = model.training_step(batch)
# Validation phase
result = evaluate(model, val_loader)
model.epoch_end(epoch, result)
return history

MSE loss in tensorflow 2.0 mistakes y_true for a reduction key

I am using a really simple neural network with the latest version of tensorflow 2.0, on a jupyter notebook running python 3.7.0.
The NN has Xip, a float as output, which I use as a parameter in my function MainGaussian_1_U, which approximates an image. When I try to compute the loss using MeanSquareError between the real image img and the approximation mk, I am given an error in which the loss function seems to take img as a reduction key. After searches, I still have no idea what this key is supposed to be, and can't find a way to debug my code:
model = tf.keras.models.Sequential()
# Add the layers
model.add(tf.keras.layers.Dense(64, activation="relu"))
model.add(tf.keras.layers.Dense(32, activation="relu"))
model.add(tf.keras.layers.Dense(1, activation="relu"))
# The loss method
loss_object = tf.keras.losses.MeanSquaredError()
# The optimize
optimizer = tf.keras.optimizers.Adam()
# This metrics is used to track the progress of the training loss during the training
train_loss = tf.keras.metrics.Mean(name='train_loss')
def train_step(Data, img):
for _ in range (5):
with tf.GradientTape() as tape:
Xip= model( (sizeh**-2 * np.ones((sizeh, sizeh))).reshape(-1, 49))
MainGaussian_1_U ()
print ("img=", img)
loss= tf.keras.losses.MeanSquaredError(img, mk)
print ("loss=", loss)
gradients = tape.gradient(loss, model.trainable_variables)
print (gradients)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_step (TestFile, TestFile[4])
The error given is:
c:\program files\python37\lib\site-packages\tensorflow_core\python\ops\losses\ FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
if key not in cls.all():
ValueError: Invalid Reduction Key [[21.05224609 20.79420471 34.9659729 ... 48.09233093 68.83874512
[20.93516541 17.0511322 39.00476074 ... 56.74258423 47.75274658
[38.18562317 22.70791626 24.37176514 ... 64.9606781 47.65338135
[85.76565552 79.45443726 73.64129639 ... 73.66456604 47.06422424
[87.14616394 82.38183594 77.00856018 ... 66.21652222 71.32862854
[36.74142456 37.27145386 34.52891541 ... 29.58699036 37.37667847
This is my first question here on Stack Overflow: please let me know if I can make it any clearer!
You correctly create the "loss object", but never use it. Instead, your code tries to create a new "loss object" with the images as parameters (which doesn't work). Instead, you want to put the images into the already-created loss object. You just have to change this line
loss= tf.keras.losses.MeanSquaredError(img, mk)
loss= loss_object(img, mk)

How to train a CNN model?

When trying to train the CNN model, I came across a code shown below:
def train(n_epochs, loaders, model, optimizer, criterion):
for epoch in range(1,n_epochs):
train_loss = 0
valid_loss = 0
for i, (data,target) in enumerate(loaders['train']):
# zero the parameter (weight) gradients
# forward pass to get outputs
output = model(data)
# calculate the loss
loss = criterion(output, target)
# backward pass to calculate the parameter gradients
# update the parameters
Can someone please tell me why is the second for loop used?
i.e; for i, (data,target) in enumerate(loaders['train']):
And why optimizer.zero_grad() and optimizer.step() is used? comes in handy when you need to prepare data batches (and perhaps shuffle them before every run).
data_train_loader = DataLoader(data_train, batch_size=64, shuffle=True)
In the above code, first for-loop iterates through the number of epochs while second loop iterates through the training dataset converted into batches via above code. For example:
for batch_idx, samples in enumerate(data_train_loader):
# samples will be a 64 x D dimensional tensor
# batch_idx is each batch index
Learn more about from here.
Optimizer.zero_gradient(): Before the backward pass, use the optimizer object to zero all of the gradients for the tensors it will update (which are the learnable weights of the model)
optimizer.step(): We generally use optimizer.step() to make the gradient descent step. Calling the step function on an Optimizer makes an update to its parameters.
Learn more about these from here.
Optimizer is used first to load the params like this (missing in your code):
optimizer = optim.Adam(model.parameters(), lr=0.001, momentum=0.9)
This code
loss = criterion(output, target)
Is used to calculate the loss of a single batch where targets is what you got from a tuple (data,target) and data is used as the input for the model, where we got the output.
This step:
Will zero all the gradients found in the optimizer, which is very important on initialization.
The part
Calculates the gradients, and the optimizer.step() updates our model weights and biases (parameters).
In PyTorch you typically use DataLoader class to load the trainging and validation sets.
Is probable the full train set, which represents a single epoch.

custom loss - keras

The following two models/compilations behave differently:
def custom_loss(y_true, y_pred):
return keras.losses.binary_crossentropy(y_true, y_pred)
optimizer = Adam(lr=5e-3)
model.compile(loss=custom_loss, optimizer=optimizer, metrics=['accuracy'])
optimizer = Adam(lr=5e-3)
model.compile(loss=keras.losses.binary_crossentropy, optimizer=optimizer, metrics=['accuracy'])
What can be the reason?
If you implement a custom binary cross-entropy loss, you should also specify the right accuracy metric. This is because if you use Keras' binary cross-entropy, then Keras will automatically adjust which accuracy metric to use (between binary and categorical accuracy).
This doesn't happen if you use a custom loss, and then Keras will default to categorical accuracy, which is actually wrong, producing incorrect accuracy values. For example:
model.compile(loss=custom_loss, optimizer=optimizer, metrics=['binary_accuracy'])
