Loss Not Coverging in SGD

Loss Not Coverging in SGD - pytorch

I am using PyTorch Linear Regression loss and my SGD loss is not converging.
Use case -
I am using MNIST dataset and implementing an image classification to classify handwritten digit 0 and 1.
Then using Logistic regression model :
model = nn.Linear(input_size,num_classes)
Created the custom loss function.
Training the model where in I converted the labels from 0,1 to -1,1. Convert labels from 0,1 to -1,1
Determine Loss.
total_step = len(train_loader)
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_loader):
# Reshape images to (batch_size, input_size)
images = images.reshape(-1, 28*28)
#Convert labels from 0,1 to -1,1
labels = Variable(2*(labels.float()-0.5))
# Forward pass
outputs = model(images)
# we need maximum value of two class prediction
oneout = torch.max(outputs.data,1)[0]
loss = loss_criteria(oneout, labels)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (i+1) % 100 == 0:
print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
.format(epoch+1, num_epochs, i+1, total_step, loss.item()))
Loss output: 0.8445,0.6883,0.7976,0.8133,0.8289,0.7195. As you notice the Loss is not converging.
Expected result :
0.8445,0.8289,0.8133,0.7976,0.7195,0.6883 all the way to zero....

Related

Precision,recall, F1 score with Sklearn on Pytorch

I've been looking through samples but am unable to understand how to integrate the precision, recall and f1 metrics for my model. My code is as follows:
for epoch in range(num_epochs):
#Calculate Accuracy (stack tutorial no n_total)
n_correct = 0
n_total = 0
for i, (words, labels) in enumerate(train_loader):
words = words.to(device)
labels = labels.to(dtype=torch.long).to(device)
# Forward pass
outputs = model(words)
loss = criterion(outputs, labels)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
#feedforward tutorial solution
_, predicted = torch.max(outputs, 1)
n_correct += (predicted == labels).sum().item()
n_total += labels.shape[0]
accuracy = 100 * n_correct/n_total
#Push to matplotlib
train_losses.append(loss.item())
train_epochs.append(epoch)
train_acc.append(accuracy)
#Loss and Accuracy
if (epoch+1) % 10 == 0:
print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.2f}, Acc: {accuracy:.2f}')

Since you have the predicted and the labels variables, you can aggregate them during the epoch loop and convert them to numpy arrays to calculate the required metrics.
At the beginning of the epoch, initialize two empty lists; one for true labels and one for ground truth labels.
for epoch in range(num_epochs):
predicted_labels, ground_truth_labels = [], []
...
Then, keep appending the respective entries to each list during the epoch:
...
_, predicted = torch.max(outputs, 1)
n_correct += (predicted == labels).sum().item()
# appending
predicted_labels.append(predicted.cpu().detach().numpy())
ground_truth_labels.append(labels.cpu().detach().numpy())
...
Then, at the epoch end, you could use precision_recall_fscore_support with predicted_labels and ground_truth_labels as inputs.
Notes:
You'll probably have to refer something like this to flatten the above two lists.
Read about torch.no_grad() to apply it as a good practice during the calculations of metrics.

How to understand a periodicity in the training loss using a pre-trained model of PyTorch?

I'm using a pre-trained model from Pytorch ( Resnet 18,34,50) in order to classify images. During the training, a weird periodicity appears in the training as you can see in the image below. Did somebody already have a similar issue?In order to deal with the overfitting, I'm using Data augmentation in the preprocessing.
When using SGD as an optimizer with the following parameters, we obtain this sort of graph:
criterion: NLLLoss()
learning rate: 0.0001
epoch: 40
print every 40 iteration
We also try adam and Adam bound as optimizers but the same periodicity was observed.
Thank's in advance for your answer!
Here is the code :
def train_classifier():
start=0
stop=0
start = timeit.default_timer()
epochs = 40
steps = 0
print_every = 40
model.to('cuda')
epo=[]
train=[]
valid=[]
acc_valid=[]
for e in range(epochs):
print('Currently running epoch',e,':')
model.train()
running_loss = 0
for images, labels in iter(train_loader):
steps += 1
images, labels = images.to('cuda'), labels.to('cuda')
optimizer.zero_grad()
output = model.forward(images)
loss = criterion(output, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if steps % print_every == 0:
model.eval()
# Turn off gradients for validation, saves memory and computations
with torch.no_grad():
validation_loss, accuracy = validation(model, val_loader, criterion)
print("Epoch: {}/{}.. ".format(e+1, epochs),
"Training Loss: {:.3f}.. ".format(running_loss/print_every),
"Validation Loss: {:.3f}.. ".format(validation_loss/len(val_loader)),
"Validation Accuracy: {:.3f}".format(accuracy/len(val_loader)))
stop = timeit.default_timer()
print('Time: ', stop - start)
acc_valid.append(accuracy/len(val_loader))
train.append(running_loss/print_every)
valid.append(validation_loss/len(val_loader))
epo.append(e+1)
running_loss = 0
model.train()
return train,epo,valid,acc_valid

how to add BatchNormalization with SWA:stochastic weights average?

I am a beginner in Deepleaning and Pytorch.
I don't understand how to use BatchNormalization in using SWA.
pytorch.org says in https://pytorch.org/blog/stochastic-weight-averaging-in-pytorch/:
Note that the SWA averages of the weights are never used to make
predictions during training, and so the batch normalization layers do
not have the activation statistics computed after you reset the
weights of your model with opt.swap_swa_sgd()
This means it's suitable for adding BatchNormalization layer after using SWA?
# it means, in my idea
#for example
opt = torchcontrib.optim.SWA(base_opt)
for i in range(100):
opt.zero_grad()
loss_fn(model(input), target).backward()
opt.step()
if i > 10 and i % 5 == 0:
opt.update_swa()
opt.swap_swa_sgd()
#save model once
torch.save(model,"swa_model.pt")
#model_load
saved_model=torch.load("swa_model.pt")
#it means adding BatchNormalization layer??
model2=saved_model
model2.add_module("Batch1",nn.BatchNorm1d(10))
# decay learning_rate more
learning_rate=0.005
optimizer = torch.optim.SGD(model2.parameters(), lr=learning_rate)
# train model again
for epoch in range(num_epochs):
loss = train(train_loader)
val_loss, val_acc = valid(test_loader)
I appreciate your replying.
following your advise,
I try to make example model adding optimizer.bn_update()
# add optimizer.bn_update() to model
criterion = nn.CrossEntropyLoss()
learning_rate=0.01
base_opt = torch.optim.SGD(model.parameters(), lr=0.1)
optimizer = SWA(base_opt, swa_start=10, swa_freq=5, swa_lr=0.05)
def train(train_loader):
#mode:train
model.train()
running_loss = 0
for batch_idx, (images, labels) in enumerate(train_loader):
optimizer.zero_grad()
outputs = model(images)
#loss
loss = criterion(outputs, labels)
running_loss += loss.item()
loss.backward()
optimizer.step()
optimizer.swap_swa_sgd()
train_loss = running_loss / len(train_loader)
return train_loss
def valid(test_loader):
model.eval()
running_loss = 0
correct = 0
total = 0
#torch.no_grad
with torch.no_grad():
for batch_idx, (images, labels) in enumerate(test_loader):
outputs = model(images)
loss = criterion(outputs, labels)
running_loss += loss.item()
_, predicted = torch.max(outputs, 1)
correct += (predicted == labels).sum().item()
total += labels.size(0)
val_loss = running_loss / len(test_loader)
val_acc = float(correct) / total
return val_loss, val_acc
num_epochs=30
loss_list = []
val_loss_list = []
val_acc_list = []
for epoch in range(num_epochs):
loss = train(train_loader)
val_loss, val_acc = valid(test_loader)
optimizer.bn_update(train_loader, model)
print('epoch %d, loss: %.4f val_loss: %.4f val_acc: %.4f'
% (epoch, loss, val_loss, val_acc))
# logging
loss_list.append(loss)
val_loss_list.append(val_loss)
val_acc_list.append(val_acc)
# optimizer.bn_updata()
optimizer.bn_update(train_loader, model)
# go on evaluating model,,,

What the documentation is telling you is that since SWA computes averages of weights but those weights aren't used for prediction during training the batch normalization layers won't see those weights. This means they haven't computed the respective statistics for them (as they were never able to) which is important because the weights are used during actual prediction (i.e. not during training).
This means, they assume you have batch normalization layers in your model and want to train it using SWA. This is (more or less) not straight-forward due to the reasons above.
One approach is given as follows:
To compute the activation statistics you can just make a forward pass on your training data using the SWA model once the training is finished.
Or you can use their helper class:
In the SWA class we provide a helper function opt.bn_update(train_loader, model). It updates the activation statistics for every batch normalization layer in the model by making a forward pass on the train_loader data loader. You only need to call this function once in the end of training.
In case you are using Pytorch's DataLoader class you can simply supply the model (after training) and the training loader to the bn_update function which updates all batch normalization statistics for you. You only need to call this function once in the end of training.
Steps to proceed:
Train your model that includes batch normalization layers using SWA
After your model has finished training, call opt.bn_update(train_loader, model) using your training data and providing your trained model

I tried to compare before and after using optimizer.bn_update() in Mnist Data.
as follows:
# using Mnist Data
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
# to compare Test Data Accuracy
X_train_=X_train[0:50000]
y_train_=y_train[0:50000]
# to validate for test data
X_train_ToCompare=X_train[50000:60000]
y_train_ToCompare=y_train[50000:60000]
print(X_train_.shape)
print(y_train_.shape)
print(X_train_ToCompare.shape)
print(y_train_ToCompare.shape)
#(50000, 784)
#(50000,)
#(10000, 784)
#(10000,)
# like keras,simple MLP model
from torch import nn
model = nn.Sequential()
model.add_module('fc1', nn.Linear(784, 1000))
model.add_module('relu1', nn.ReLU())
model.add_module('fc2', nn.Linear(1000, 1000))
model.add_module('relu2', nn.ReLU())
model.add_module('fc3', nn.Linear(1000, 10))
print(model)
# using GPU
model.cuda()
criterion = nn.CrossEntropyLoss()
learning_rate=0.01
base_opt = torch.optim.SGD(model.parameters(), lr=0.1)
optimizer = SWA(base_opt, swa_start=10, swa_freq=5, swa_lr=0.05)
def train(train_loader):
model.train()
running_loss = 0
for batch_idx, (images, labels) in enumerate(train_loader):
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
running_loss += loss.item()
loss.backward()
optimizer.step()
optimizer.swap_swa_sgd()
train_loss = running_loss / len(train_loader)
return train_loss
def valid(test_loader):
model.eval()
running_loss = 0
correct = 0
total = 0
with torch.no_grad():
for batch_idx, (images, labels) in enumerate(test_loader):
outputs = model(images)
loss = criterion(outputs, labels)
running_loss += loss.item()
_, predicted = torch.max(outputs, 1)
correct += (predicted == labels).sum().item()
total += labels.size(0)
val_loss = running_loss / len(test_loader)
val_acc = float(correct) / total
return val_loss, val_acc
num_epochs=30
loss_list = []
val_loss_list = []
val_acc_list = []
for epoch in range(num_epochs):
loss = train(train_loader)
val_loss, val_acc = valid(test_loader)
optimizer.bn_update(train_loader, model)
print('epoch %d, loss: %.4f val_loss: %.4f val_acc: %.4f'
% (epoch, loss, val_loss, val_acc))
# logging
loss_list.append(loss)
val_loss_list.append(val_loss)
val_acc_list.append(val_acc)
# output:
# epoch 0, loss: 0.7832 val_loss: 0.5381 val_acc: 0.8866
# ...
# epoch 29, loss: 0.0502 val_loss: 0.0758 val_acc: 0.9772
#evaluate model
# attempt to evaluate model before optimizer.bn_update()
# using X_train_toCompare for test data
model.eval()
predicted_list=[]
for i in range(len(X_train_ToCompare)):
temp_predicted=model(torch.cuda.FloatTensor(X_train_ToCompare[i]))
_,y_predicted=torch.max(temp_predicte,0)
predicted_list.append(int(y_predicted))
sum(predicted_list==y_train_ToCompare)
# test accuracy 9757/10000
#after using:optimizer.bn_update
model.train()
optimizer.bn_update(train_loader, model)
# evaluate model
model.eval()
predicted_list_afterBatchNorm=[]
for i in range(len(X_train_ToCompare)):
temp_predicted=model(torch.cuda.FloatTensor(X_train_ToCompare[i]))
_,y_predicted=torch.max(temp_predicted,0)
predicted_list_afterBatchNorm.append(int(y_predicted))
sum(predicted_list_withNorm==y_train_ToCompare)
# test accuracy 9778/10000
I　don't know if this way is correct to validate...
Using optimizer.bn_update() method, I confirm test accuracy is improved ofen.
but some test accuracy is descended：I think this is because of
simple MLP model and learning epochs are not enough.
there is need to try test more.
thank you for reply.

vgg pytorch is probability distribution supposed to add up to 1?

I've trained a vgg16 model to predict 102 classes of flowers.
It works however now that I'm trying to understand one of it's predictions I feel it's not acting normally.
model layout
# Imports here
import os
import numpy as np
import torch
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import json
from pprint import pprint
from scipy import misc
%matplotlib inline
data_dir = 'flower_data'
train_dir = data_dir + '/train'
test_dir = data_dir + '/valid'
json_data=open('cat_to_name.json').read()
main_classes = json.loads(json_data)
main_classes = {int(k):v for k,v in classes.items()}
train_transform_2 = transforms.Compose([transforms.RandomResizedCrop(224),
transforms.RandomRotation(30),
transforms.RandomHorizontalFlip(),
transforms.ToTensor()])
test_transform_2= transforms.Compose([transforms.RandomResizedCrop(224),
transforms.ToTensor()])
# TODO: Load the datasets with ImageFolder
train_data = datasets.ImageFolder(train_dir, transform=train_transform_2)
test_data = datasets.ImageFolder(test_dir, transform=test_transform_2)
# define dataloader parameters
batch_size = 20
num_workers=0
# TODO: Using the image datasets and the trainforms, define the dataloaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
num_workers=num_workers, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size,
num_workers=num_workers, shuffle=True)
vgg16 = models.vgg16(pretrained=True)
# Freeze training for all "features" layers
for param in vgg16.features.parameters():
param.requires_grad = False
import torch.nn as nn
n_inputs = vgg16.classifier[6].in_features
# add last linear layer (n_inputs -> 102 flower classes)
# new layers automatically have requires_grad = True
last_layer = nn.Linear(n_inputs, len(classes))
vgg16.classifier[6] = last_layer
import torch.optim as optim
# specify loss function (categorical cross-entropy)
criterion = nn.CrossEntropyLoss()
# specify optimizer (stochastic gradient descent) and learning rate = 0.001
optimizer = optim.SGD(vgg16.classifier.parameters(), lr=0.001)
pre_trained_model=torch.load("model.pt")
new=list(pre_trained_model.items())
my_model_kvpair=vgg16.state_dict()
count=0
for key,value in my_model_kvpair.items():
layer_name, weights = new[count]
my_model_kvpair[key] = weights
count+=1
# number of epochs to train the model
n_epochs = 6
# initialize tracker for minimum validation loss
valid_loss_min = np.Inf # set initial "min" to infinity
for epoch in range(1, n_epochs+1):
# keep track of training and validation loss
train_loss = 0.0
valid_loss = 0.0
###################
# train the model #
###################
# model by default is set to train
vgg16.train()
for batch_i, (data, target) in enumerate(train_loader):
# clear the gradients of all optimized variables
optimizer.zero_grad()
# forward pass: compute predicted outputs by passing inputs to the model
output = vgg16(data)
# calculate the batch loss
loss = criterion(output, target)
# backward pass: compute gradient of the loss with respect to model parameters
loss.backward()
# perform a single optimization step (parameter update)
optimizer.step()
# update training loss
train_loss += loss.item()
if batch_i % 20 == 19: # print training loss every specified number of mini-batches
print('Epoch %d, Batch %d loss: %.16f' %
(epoch, batch_i + 1, train_loss / 20))
train_loss = 0.0
######################
# validate the model #
######################
vgg16.eval() # prep model for evaluation
for data, target in test_loader:
# forward pass: compute predicted outputs by passing inputs to the model
output = vgg16(data)
# calculate the loss
loss = criterion(output, target)
# update running validation loss
valid_loss += loss.item()
# print training/validation statistics
# calculate average loss over an epoch
train_loss = train_loss/len(train_loader.dataset)
valid_loss = valid_loss/len(test_loader.dataset)
print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
epoch+1,
train_loss,
valid_loss
))
# save model if validation loss has decreased
if valid_loss <= valid_loss_min:
print('Validation loss decreased ({:.6f} --> {:.6f}). Saving model ...'.format(
valid_loss_min,
valid_loss))
torch.save(vgg16.state_dict(), 'model.pt')
valid_loss_min = valid_loss
testing on a single image
tensor = torch.from_numpy(test_image)
reshaped = tensor.permute(2, 0, 1).unsqueeze(0)
floatified = reshaped.to(torch.float32) / 255
vgg16(floatified)
>>>
tensor([[ 2.5686, -1.1964, -0.0872, -1.7010, -1.6669, -1.0638, 0.4515, 0.1124,
0.0166, 0.3156, 1.1699, 1.5374, 1.8720, 2.5184, 2.9046, -0.8241,
-1.1949, -0.5700, 0.8692, -1.0485, 0.0390, -1.3783, -3.4632, -0.0143,
1.0986, 0.2667, -1.1127, -0.8515, 0.7759, -0.7528, 1.6366, -0.1170,
-0.4983, -2.6970, 0.7545, 0.0188, 0.1094, 0.5002, 0.8838, -0.0006,
-1.7993, -1.3706, 0.4964, -0.3251, -1.7313, 1.8731, 2.4963, 1.1713,
-1.5726, 1.5476, 3.9576, 0.7388, 0.0228, 0.3947, -1.7237, -1.8350,
-2.0297, 1.4088, -1.3469, 1.6128, -1.0851, 2.0257, 0.5881, 0.7498,
0.0738, 2.0592, 1.8034, -0.5468, 1.9512, 0.4534, 0.7746, -1.0465,
-0.7254, 0.3333, -1.6506, -0.4242, 1.9529, -0.4542, 0.2396, -1.6804,
-2.7987, -0.6367, -0.3599, 1.0102, 2.6319, 0.8305, -1.4333, 3.3043,
-0.4021, -0.4877, 0.9125, 0.0607, -1.0326, 1.3186, -2.5861, 0.1211,
-2.3177, -1.5040, 1.0416, 1.4008, 1.4225, -2.7291]],
grad_fn=<ThAddmmBackward>)
sum([ 2.5686, -1.1964, -0.0872, -1.7010, -1.6669, -1.0638, 0.4515, 0.1124,
0.0166, 0.3156, 1.1699, 1.5374, 1.8720, 2.5184, 2.9046, -0.8241,
-1.1949, -0.5700, 0.8692, -1.0485, 0.0390, -1.3783, -3.4632, -0.0143,
1.0986, 0.2667, -1.1127, -0.8515, 0.7759, -0.7528, 1.6366, -0.1170,
-0.4983, -2.6970, 0.7545, 0.0188, 0.1094, 0.5002, 0.8838, -0.0006,
-1.7993, -1.3706, 0.4964, -0.3251, -1.7313, 1.8731, 2.4963, 1.1713,
-1.5726, 1.5476, 3.9576, 0.7388, 0.0228, 0.3947, -1.7237, -1.8350,
-2.0297, 1.4088, -1.3469, 1.6128, -1.0851, 2.0257, 0.5881, 0.7498,
0.0738, 2.0592, 1.8034, -0.5468, 1.9512, 0.4534, 0.7746, -1.0465,
-0.7254, 0.3333, -1.6506, -0.4242, 1.9529, -0.4542, 0.2396, -1.6804,
-2.7987, -0.6367, -0.3599, 1.0102, 2.6319, 0.8305, -1.4333, 3.3043,
-0.4021, -0.4877, 0.9125, 0.0607, -1.0326, 1.3186, -2.5861, 0.1211,
-2.3177, -1.5040, 1.0416, 1.4008, 1.4225, -2.7291])
>>>
5.325799999999998
given this as how I test it on a single image (and the model as usual is trained and tested on batches it returns a prediction matrix that doesn't seem to be normalized or add up to 1.
Is this normal?

I cannot tell with certainty without seeing your training code, but it's most likely your model was trained with cross-entropy loss and as such it outputs logits rather than class probabilities. You can turn them into proper probabilities by applying the softmax function.

Adding second hidden layer in Tensorflow breaks loss calculation

I'm am working on assignment three of the Udacity Deep Learning course. I have a working neural network with one hidden layer. However, when I add a second one, the loss results in nan.
This is the graph code:
num_nodes_layer_1 = 1024
num_nodes_layer_2 = 128
num_inputs = 28 * 28
num_labels = 10
batch_size = 128
graph = tf.Graph()
with graph.as_default():
# Input data. For the training data, we use a placeholder that will be fed
# at run time with a training minibatch.
tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, num_inputs))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
# variables
# hidden layer 1
hidden_weights_1 = tf.Variable(tf.truncated_normal([num_inputs, num_nodes_layer_1]))
hidden_biases_1 = tf.Variable(tf.zeros([num_nodes_layer_1]))
# hidden layer 2
hidden_weights_2 = tf.Variable(tf.truncated_normal([num_nodes_layer_1, num_nodes_layer_2]))
hidden_biases_2 = tf.Variable(tf.zeros([num_nodes_layer_2]))
# linear layer
weights = tf.Variable(tf.truncated_normal([num_nodes_layer_2, num_labels]))
biases = tf.Variable(tf.zeros([num_labels]))
# Training computation.
y1 = tf.nn.relu(tf.matmul(tf_train_dataset, hidden_weights_1) + hidden_biases_1)
y2 = tf.nn.relu(tf.matmul(y1, hidden_weights_2) + hidden_biases_2)
logits = tf.matmul(y2, weights) + biases
# Calc loss
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf_train_labels, logits=logits))
# Optimizer.
# We are going to find the minimum of this loss using gradient descent.
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
# Predictions for the training, validation, and test data.
# These are not part of training, but merely here so that we can report
# accuracy figures as we train.
train_prediction = tf.nn.softmax(logits)
y1_valid = tf.nn.relu(tf.matmul(tf_valid_dataset, hidden_weights_1) + hidden_biases_1)
y2_valid = tf.nn.relu(tf.matmul(y1_valid, hidden_weights_2) + hidden_biases_2)
valid_prediction = tf.nn.softmax(tf.matmul(y2_valid, weights) + biases)
y1_test = tf.nn.relu(tf.matmul(tf_test_dataset, hidden_weights_1) + hidden_biases_1)
y2_test = tf.nn.relu(tf.matmul(y1_test, hidden_weights_2) + hidden_biases_2)
test_prediction = tf.nn.softmax(tf.matmul(y2_test, weights) + biases)
It does not give an error. But after the first time, the loss is unable to print and it doesn't learn.
Initialized
Minibatch loss at step 0: 2133.468750
Minibatch accuracy: 8.6%
Validation accuracy: 10.0%
Minibatch loss at step 400: nan
Minibatch accuracy: 9.4%
Validation accuracy: 10.0%
Minibatch loss at step 800: nan
Minibatch accuracy: 11.7%
Validation accuracy: 10.0%
Minibatch loss at step 1200: nan
Minibatch accuracy: 4.7%
Validation accuracy: 10.0%
Minibatch loss at step 1600: nan
Minibatch accuracy: 7.8%
Validation accuracy: 10.0%
Minibatch loss at step 2000: nan
Minibatch accuracy: 6.2%
Validation accuracy: 10.0%
Test accuracy: 10.0%
When I remove the second layer it trains and I get an accuracy of about 85%. With a second layer I would suspect the score to be between 80% and 90%.
Am I using the wrong optimizer? Is it just something stupid I missed?
This is the session code:
num_steps = 2001
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
print("Initialized")
for step in range(num_steps):
# Pick an offset within the training data, which has been randomized.
# Note: we could use better randomization across epochs.
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
# Generate a minibatch.
batch_data = train_dataset[offset:(offset + batch_size), :]
batch_labels = train_labels[offset:(offset + batch_size), :]
# Prepare a dictionary telling the session where to feed the minibatch.
# The key of the dictionary is the placeholder node of the graph to be fed,
# and the value is the numpy array to feed to it.
feed_dict = {
tf_train_dataset : batch_data,
tf_train_labels : batch_labels,
}
_, l, predictions = session.run(
[optimizer, loss, train_prediction], feed_dict=feed_dict)
if (step % 400 == 0):
print("Minibatch loss at step %d: %f" % (step, l))
print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
print("Validation accuracy: %.1f%%" % accuracy(valid_prediction.eval(), valid_labels))
acc = accuracy(test_prediction.eval(), test_labels)
print("Test accuracy: %.1f%%" % acc)

Your learning rate of 0.5 is too high, set it to 0.05 and it'll converge.
Minibatch loss at step 0: 1506.469238
Minibatch loss at step 400: 7796.088867
Minibatch loss at step 800: 9893.363281
Minibatch loss at step 1200: 5089.553711
Minibatch loss at step 1600: 6148.481445
Minibatch loss at step 2000: 5257.598145
Minibatch loss at step 2400: 1716.116455
Minibatch loss at step 2800: 1600.826538
Minibatch loss at step 3200: 941.884766
Minibatch loss at step 3600: 1033.936768
Minibatch loss at step 4000: 1808.775757
Minibatch loss at step 4400: 113.909866
Minibatch loss at step 4800: 49.800560
Minibatch loss at step 5200: 20.392700
Minibatch loss at step 5600: 6.253595
Minibatch loss at step 6000: 4.372780
Minibatch loss at step 6400: 6.862935
Minibatch loss at step 6800: 6.951239
Minibatch loss at step 7200: 3.528607
Minibatch loss at step 7600: 2.968611
Minibatch loss at step 8000: 3.164592
...
Minibatch loss at step 19200: 2.141401
Also a couple of pointers:
tf_train_dataset and tf_train_labels should be tf.placeholders of shape [None, 784]. The None dimension allows you to vary the batch size during training, instead of being limited to a size number such as 128.
Instead of using tf_valid_dataset and tf_test_dataset as tf.constant, just pass your validation and test datasets in the respective feed_dicts, this will allow you to get rid of the extra ops at the end of your graph for validation and test accuracy.
I'd recommended sampling from a separate batch of validation and test data rather than using the same batch of data for each iteration of checking the val/test accuracy.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Loss Not Coverging in SGD - pytorch

Related

Precision,recall, F1 score with Sklearn on Pytorch

How to understand a periodicity in the training loss using a pre-trained model of PyTorch?

how to add BatchNormalization with SWA:stochastic weights average?

vgg pytorch is probability distribution supposed to add up to 1?

Adding second hidden layer in Tensorflow breaks loss calculation

Categories

Resources