I am trying to predict the outcome of a football fixture (using backpropagation) across 3 classes: home team wins, draw or away team wins; they are encoded as 0, 1, and 2, respectively.
Features: home_team, away_team, home_score, away_score, home_adv, match_imp
Target: outcome_final
Training, validation and test tensors:
X_train: torch.Size([25365, 554])
y_train: torch.Size([25365])
X_test: torch.Size([5436, 554])
y_test: torch.Size([5436])
X_val: torch.Size([5436, 554])
y_val: torch.Size([5436])
Network architecture:
(fc1): Linear(in_features=555, out_features=100, bias=True)
(fc2): Linear(in_features=100, out_features=3, bias=True)
(dropout): Dropout(p=0.2, inplace=False)
Weights and biases are generated at first:
fc1.weight: torch.Size([100, 554])
fc1.bias: torch.Size([100])
fc2.weight: torch.Size([3, 100])
fc2.bias: torch.Size([3])
ReLU activation function is used for the hidden layer, and Softmax activation function is used for the output layer.
The following code returns the error below.
# Creating the class for the neural network
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(554, 100)
self.fc2 = nn.Linear(100, 3)
self.dropout = nn.Dropout(p = 0.2)
def forward(self, x):
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = F.softmax(self.fc2(x), dim = 1)
return x
# Initializing model
model = Net().to(device)
# Initializing weights and biases
model.fc1.weight.data.normal_(0, 0.01)
model.fc1.bias.data.normal_(0, 0.01)
model.fc2.weight.data.normal_(0, 0.01)
model.fc2.bias.data.normal_(0, 0.01)
# TRAIN the model
def train_model(model, X_train, y_train, X_val, y_val, epochs = 10, learning_rate = 0.003):
# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr = learning_rate)
# Losses and accuracies
train_losses = []
val_losses = []
train_accs = []
val_accs = []
# Training happens here
for epoch in range(epochs):
# Shuffling data
permutation = torch.randperm(X_train.size()[0]).to(device)
X_train = X_train[permutation]
y_train = y_train[permutation]
# Creating batches
batch_size = 5
n_batches = X_train.size()[0] // batch_size
for i in range(n_batches):
# Zeroing gradients
# Forward pass
output = model(X_train[i * batch_size : (i + 1) * batch_size])
loss = criterion(output, y_train[i * batch_size : (i + 1) * batch_size].long())
# Backward pass
# Updating weights and biases
# Sending to CPU
# Training loss and accuracy
train_loss = criterion(model(X_train), y_train.long())
train_acc = accuracy_score(y_train, torch.argmax(model(X_train), dim = 1))
print('Epoch: ', epoch + 1, 'Training Loss: ', train_loss, 'Training Accuracy: ', train_acc)
# Validation loss and accuracy
val_loss = criterion(model(X_val), y_val.long())
val_acc = accuracy_score(y_val, torch.argmax(model(X_val), dim = 1))
print('Epoch: ', epoch + 1, 'Validation Loss: ', val_loss, 'Validation Accuracy: ', val_acc)
# Sending back to GPU
return train_losses, val_losses, train_accs, val_accs
# Let's train the model
model = Net().to(device)
train_losses, val_losses, train_accs, val_accs = train_model(model, X_train, y_train, X_val, y_val)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)
I have tried ensuring all training and validation sets are converted as tensors and sent to the GPU. Yet, I am still getting this error.
Am I missing something here? Thanks in advance.


PyTorch: LSTM predicts the same constant value

I want to predict one variable using 7 features with time steps of 4:
# Shape X_train: torch.Size([24433, 4, 7]
# Shape Y_train: torch.Size([24433, 4, 1]
# Shape X_test: torch.Size([6109, 4, 7]
# Shape Y_test: torch.Size([6109, 4, 1]
train_dataset = TensorDataset(X_train, Y_train)
test_dataset = TensorDataset(X_test, Y_test)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)
My (initial) LSTM model:
class LSTMModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
self.lstm = nn.LSTM(input_size, hidden_size)
self.linear = nn.Linear(hidden_size, output_size)
def forward(self, x):
x, _ = self.lstm(x)
x = self.linear(x)
return x
model = LSTMModel(input_size=7, hidden_size=256, output_size=1)
loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
Apply model:
# Loop over the training set
for X, Y in train_loader:
Y_pred = model(X)
loss = loss_fn(Y_pred, Y)
# Loop over the test set
for X, Y in test_loader:
Y_pred = model(X)
loss = loss_fn(Y_pred, Y)
An example of Y (true data):
[[ 100.],
[ 0.],
[ 0.],
[ 0.]],
# etc.
However, my Y_pred is somewhat like this:
# etc.
I have tried numerous different things:
Changing the model architecture (different batch size, different number of layers)
Adding dropout and decay parameters
Using epochs and changing the number of epochs when looping over training and test data
Different optimizers (Adam, SGD) with different learning rates
Log transforming my input data
Examples of my data in a previous question.
I am fairly new with PyTorch and LSTMs so I might do it wrong, but, whatever I change, I keep getting a (near) constant value from the predictions. What am I doing wrong/what should I be doing?
I solved this by normalizing my input data. I now obtain different predictions for every output:
# Calculate the mean and standard deviation of each feature in the training set
X_mean = X_train.mean(dim=0)
X_std = X_train.std(dim=0)
# Standardize the training set
X_train = (X_train - X_mean) / X_std
# Standardize the test set using the mean and standard deviation of the training set
X_test = (X_test - X_mean) / X_std

LSTM: calculating MSELoss in for loop returns NAN when backward pass

I am new with LSTM and ran into a problem. I'm trying to predict a variable using 7 features in time steps of 4. I am working with PyTorch.
From my initial data frame (traindf), I created tensors for every feature and the target (Y) by:
featureX_train = torch.tensor(traindf.featureX[:test].values).view(-1, 4, 1)
Y_train = torch.tensor(traindf.Y[:test].values).view(-1, 4, 1)
featureX_test = torch.tensor(traindf.featureX[test:].values).view(-1, 4, 1)
Y_test = torch.tensor(traindf.Y[test:].values).view(-1, 4, 1)
I concatenated all the feature tensors into one X_train and one X_test. All tensors are float32:
print(X_train.shape, Y_train.shape)
print(X_test.shape, Y_test.shape)
torch.Size([24436, 4, 7]) torch.Size([24436, 4, 1])
torch.Size([6109, 4, 7]) torch.Size([6109, 4, 1])
Eventually, I have a train and test data set:
train_dataset = TensorDataset(X_train, Y_train)
test_dataset = TensorDataset(X_test, Y_test)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)
Preview of my data:
(tensor([[ 7909.0000, 8094.0000, 9119.0000, 8666.0000, 17599.0000, 13657.0000,
[ 7909.0000, 8073.0000, 9119.0000, 8636.0000, 17609.0000, 13975.0000,
[ 7939.5000, 8083.5000, 9166.5000, 8659.5000, 18124.5000, 13971.0000,
[ 7951.0000, 8064.0000, 9201.0000, 8663.0000, 17985.0000, 13967.0000,
10076.0000]]), tensor([[41.],
(tensor([[ 8411.0000, 8530.0000, 9439.0000, 9101.0000, 17368.0000, 14174.0000,
[ 8460.0000, 8651.5000, 9579.5000, 9355.5000, 17402.0000, 14509.0000,
[ 8436.0000, 8617.0000, 9579.0000, 9343.0000, 17318.0000, 14288.0000,
[ 8519.0000, 8655.0000, 9580.0000, 9348.0000, 17566.0000, 14640.0000,
11404.0000]]), tensor([[59.],
Applying LSTM model
My LSTM model:
class LSTMModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
self.lstm = nn.LSTM(input_size, hidden_size)
self.linear = nn.Linear(hidden_size, output_size)
def forward(self, x):
x, _ = self.lstm(x)
# x = self.linear(x[:, -1, :])
x = self.linear(x)
return x
model = LSTMModel(input_size=7, hidden_size=32, output_size=1)
loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())
When I try:
for X, Y in train_loader:
Y_pred = model(X)
loss = loss_fn(Y_pred, Y)
I get (correctly I assume) Loss: tensor(1318.9419, grad_fn=<MseLossBackward0>)
However, when I run:
for X, Y in train_loader:
Y_pred = model(X)
loss = loss_fn(Y_pred, Y)
# Now apply backward pass
I get: tensor(nan, grad_fn=<MseLossBackward0>)
Tried normalizing
I have tried normalizing the data:
mean = X.mean()
std = X.std()
X_normalized = (X - mean) / std
Y_pred = model(X_normalized)
But it yields the same result. Why do I yield 'nan' after applying loss.backward() in such a loop? How can I fix this? Thanks in advance!
My X_train contained few nan values. By removing the matrices with nan values, I solved this issue:
mask = torch.isnan(X_train).any(dim=1).any(dim=1)
X_train = X_train[~mask]
# Do the same for Y_train as it needs to be the same size
Y_train = Y_train[~mask]
# Create the TensorDataset for the training set
train_dataset = TensorDataset(X_train, Y_train)

PyTorch simple ConvNet diverge so easly

So I'm studiying pytorch coming from a background with tensorflow.
I'm trying to replicate a simple convnet, that I've developed with success in tensorflow, to classify cat vs dogs images.
In pytorch I see some strange behaviors:
Using a Learning Rate of 0.001 make the CNet predicting only 0 after the first batch (might be exploding gradients?)
Using a Learning Rate of 0.0005 gives a smooth learning curve and the CNet converge
Can anyone help me to understand what I'm doing wrong? that the code:
import pathlib
import torch
import torch.nn.functional as F
import torchvision
from torch.utils.data.dataloader import DataLoader
import numpy as np
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
class CNet(torch.nn.Module):
def __init__(self):
super(CNet, self).__init__() #input is 180x180 image
self.conv1 = torch.nn.Conv2d(3, 32, 3) # out -> 178x178x32
self.conv2 = torch.nn.Conv2d(32, 64, 3)
self.conv3 = torch.nn.Conv2d(64, 128, 3)
self.conv4 = torch.nn.Conv2d(128, 256, 3)
self.conv5 = torch.nn.Conv2d(256, 256, 3)
self.flatten = torch.nn.Flatten()
#self.fc = torch.nn.LazyLinear(1)
self.fc = torch.nn.Linear(7*7*256, 1)
def forward(self, x):
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
x = F.max_pool2d(F.relu(self.conv3(x)), (2, 2))
x = F.max_pool2d(F.relu(self.conv4(x)), (2, 2))
x = F.relu(self.conv5(x))
x = self.flatten(x)
o = torch.sigmoid(self.fc(x))
return o
def train(model : CNet, train_data : DataLoader, criterion, optimizer : torch.optim.Optimizer, epochs = 10, validation_data : DataLoader = None):
losses = []
for epoch in range(epochs):
epoch_loss = 0.0
running_loss = 0.0
for i, data in enumerate(train_data, 0):
imgs, labels = data
imgs, labels = imgs.to(device), labels.to(device, dtype=torch.float)
labels = labels.unsqueeze(-1)
# run
output = net(imgs)
# zero out accumulated grads
loss = criterion(output, labels)
running_loss += loss.item()
epoch_loss += loss.item()
#if i % 50 == 49:
# print(f'[{epoch+1}, {i:5d}] loss: {running_loss / 50.0:.3f}')
# running_loss = 0.0
losses.append(epoch_loss / len(train_data.dataset))
print(f'[{epoch+1}, {epochs:5d}] loss: {losses[-1]:.3f}')
return losses
if __name__=="__main__":
transforms = torchvision.transforms.Compose([
torchvision.transforms.Resize((180, 180)),
dataset_dir = pathlib.Path("E:\Datasets\\torch\Cat_Dog\cats_vs_dogs_small")
train_data = torchvision.datasets.ImageFolder(dataset_dir / "train", transform=transforms)
validation_data = torchvision.datasets.ImageFolder(dataset_dir / "validation", transform=transforms)
test_data = torchvision.datasets.ImageFolder(dataset_dir / "test", transform=transforms)
train_data_loader = DataLoader(train_data, batch_size=32, shuffle=True, num_workers=2, persistent_workers=True, pin_memory=True)
validation_data_loader = DataLoader(validation_data, batch_size=32, num_workers=2, shuffle=True, pin_memory=True)
test_data_loader = DataLoader(test_data, batch_size=32, shuffle=True, pin_memory=True, num_workers=2)
import matplotlib.pyplot as plt
#for i in range(1, 10):
# plt.subplot(3, 3, i)
# plt.axis('off')
# rand_idx = np.random.random_integers(0, len(train_data))
# plt.imshow(np.moveaxis(test_data[rand_idx][0].numpy(), 0, 2))
net = CNet()
net = net.to(device)
criterion = torch.nn.BCELoss()
optimizer = torch.optim.RMSprop(net.parameters(), 0.001)
# TODO save best model
losses = train(net, train_data_loader, criterion, optimizer, epochs=30)
epochs = range(1, len(losses) + 1)
plt.plot(epochs, losses, 'bo', label='Training Loss')
print('Training Finished')
correct_count, all_count = 0, 0
for images,labels in test_data_loader:
images,labels = images.to(device), labels.to(device, dtype=torch.float)
with torch.no_grad():
ps = net(images)
pred_label = (ps > 0.5).to(torch.float)
true_label = labels.unsqueeze(1)
correct_count += (pred_label == true_label).sum().item()
all_count += len(labels)
print("Number Of Images Tested =", all_count)
print("\nModel Accuracy =", (correct_count/all_count))
and here some screenshot of the loss for each point:
LR=0.001 (not convering on pytorch, converging on tensorflow)
LR=0.0005 (converging in 30 epochs) [I know that the validation loss is not 0, accuracy is ~70% but is expected]
As you can see the loss on the two experiment are very different in scale. What might cause that such a weird behavior? I call it 'wierd' cause I never seen that happen on tensorflow.
Is typicall such different behavior between those 2 framework? or am I loosing something?

MNIST dataset overfitting

I am working with the MNIST dataset and I have created the following network. I want to overfit the training data and I think I am doing that here. My training loss is lower than my validation loss. This is the code that I have come up with. Please look at it and let me know if I am overfitting the training data, if I am not then how do I go about it?
class NN(nn.Module):
def __init__(self):
self.layers = nn.Sequential(
def forward(self,x):
return self.layers(x)
def accuracy_and_loss(model, loss_function, dataloader):
total_correct = 0
total_loss = 0
total_examples = 0
n_batches = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = model(images)
batch_loss = loss_function(outputs,labels)
n_batches += 1
total_loss += batch_loss.item()
_, predicted = torch.max(outputs, dim=1)
total_examples += labels.size(0)
total_correct += (predicted == labels).sum().item()
accuracy = total_correct / total_examples
mean_loss = total_loss / n_batches
return (accuracy, mean_loss)
def define_and_train(model,dataset_training, dataset_test):
trainloader = torch.utils.data.DataLoader( small_trainset, batch_size=500, shuffle=True)
testloader = torch.utils.data.DataLoader( dataset_test, batch_size=500, shuffle=True)
values = [1e-8,1e-7,1e-6,1e-5]
model = NN()
for params in values:
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay = 1e-7)
train_acc = []
val_acc = []
train_loss = []
val_loss = []
for epoch in range(100):
total_loss = 0
total_correct = 0
total_examples = 0
n_mini_batches = 0
for i,mini_batch in enumerate(trainloader,0):
images,labels = mini_batch
outputs = model(images)
loss = loss_function(outputs,labels)
n_mini_batches += 1
total_loss += loss.item()
_, predicted = torch.max(outputs, dim=1)
total_examples += labels.size(0)
total_correct += (predicted == labels).sum().item()
epoch_training_accuracy = total_correct / total_examples
epoch_training_loss = total_loss / n_mini_batches
epoch_val_accuracy, epoch_val_loss = accuracy_and_loss( model, loss_function, testloader )
print('Params %f Epoch %d loss: %.3f acc: %.3f val_loss: %.3f val_acc: %.3f'
%(params, epoch+1, epoch_training_loss, epoch_training_accuracy, epoch_val_loss, epoch_val_accuracy))
train_loss.append( epoch_training_loss )
train_acc.append( epoch_training_accuracy )
val_loss.append( epoch_val_loss )
val_acc.append( epoch_val_accuracy )
history = { 'train_loss': train_loss,
'train_acc': train_acc,
'val_loss': val_loss,
'val_acc': val_acc }
return ( history, model )
history1, net1 = define_and_train(model,dataset_training,dataset_test)
I am trying to overfit the training data so that later i can apply regularization and then reduce the overfitting which will give me a better understanding of the process
Although I won't attempt to provide a rigorous definition, the term "overfit" typically means that the training loss continues to decrease whereas the validation loss stays stagnant at a position higher than the training loss, or continues to increase with more iterations.
Therefore, it is difficult to know whether your network is overfitting solely based on your code alone. Since dense, fully-connected networks tend to overfit easily in the absence of dropout layers or other regularizers, my hunch would be that your network is indeed overfitting according to your intention. However, we would have to see your tensorboard logs or loss plot to determine whether the model is overfitting.
If you want to overfit your network to the dataset, I suggest that you construct a much larger model with more hidden layers. Overfitting occurs when the dataset is "too easy" for the model and it starts to remember the training set itself without learning generalizable patterns that can be applied to the validation set.

Function AddmmBackward returned an invalid gradient

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import matplotlib.pyplot as plt
import numpy as np
import torch.optim as optim
class NeuralNetwork(nn.Module):
def __init__(self):
self.conv1 = nn.Conv2d(1, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 3)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = NeuralNetwork()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split
def UploadData(path, train):
#set up transforms for train and test datasets
train_transforms = transforms.Compose([transforms.Grayscale(num_output_channels=1), transforms.Resize(255), transforms.CenterCrop(224), transforms.RandomRotation(30),
transforms.RandomHorizontalFlip(), transforms.transforms.ToTensor()])
valid_transforms = transforms.Compose([transforms.Grayscale(num_output_channels=1), transforms.Resize(255), transforms.CenterCrop(224), transforms.RandomRotation(30),
transforms.RandomHorizontalFlip(), transforms.transforms.ToTensor()])
test_transforms = transforms.Compose([transforms.Grayscale(num_output_channels=1), transforms.Resize(255), transforms.CenterCrop(224), transforms.ToTensor()])
#set up datasets from Image Folders
train_dataset = datasets.ImageFolder(path + '/train', transform=train_transforms)
valid_dataset = datasets.ImageFolder(path + '/validation', transform=valid_transforms)
test_dataset = datasets.ImageFolder(path + '/test', transform=test_transforms)
#set up dataloaders with batch size of 32
trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
validloader = torch.utils.data.DataLoader(valid_dataset, batch_size=32, shuffle=True)
testloader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=True)
return trainloader, validloader, testloader
trainloader, validloader, testloader = UploadData("/home/lns/research/dataset", True)
epochs = 5
min_valid_loss = np.inf
for e in range(epochs):
train_loss = 0.0
for data, labels in trainloader:
# Transfer Data to GPU if available
if torch.cuda.is_available():
print("using GPU for data")
data, labels = data.cuda(), labels.cuda()
# Clear the gradients
# Forward Pass
target = net(data)
# Find the Loss
loss = criterion(target,labels)
# Calculate gradients
# Update Weights
# Calculate Loss
train_loss += loss.item()
valid_loss = 0.0
model.eval() # Optional when not using Model Specific layer
for data, labels in validloader:
# Transfer Data to GPU if available
if torch.cuda.is_available():
print("using GPU for data")
data, labels = data.cuda(), labels.cuda()
# Forward Pass
target = net(data)
# Find the Loss
loss = criterion(target,labels)
# Calculate Loss
valid_loss += loss.item()
print('Epoch ',e+1, '\t\t Training Loss: ',train_loss / len(trainloader),' \t\t Validation Loss: ',valid_loss / len(validloader))
if min_valid_loss > valid_loss:
print("Validation Loss Decreased(",min_valid_loss,"--->",valid_loss,") \t Saving The Model")
min_valid_loss = valid_loss
# Saving State Dict
torch.save(net.state_dict(), '/home/lns/research/MODEL.pth')
After searching a lot i am asking for help. Can someone help me
understand why this error is occuring in backward propagation.
i followed pytorch cnn tutorail and geeksforgeeks tutorial
dataset is x ray images transformed into grayscale and resize to 255
Is my neural network is wrong or data is not processed correctly?
This is a size mismmatch between the output of your CNN and the number of neurons on on your first fully-connected layer. Because of missing padding, the number of elements when flattened is 16*4*4 i.e. 256 (and not 16*5*5):
self.fc1 = nn.Linear(256, 120)
Once modified, the model will run correctly:
>>> model = NeuralNetwork()
>>> model(torch.rand(1, 1, 28, 28)).shape
torch.Size([1, 3])
Alternatively, you can use an nn.LazyLinear which will deduce the in_feature argument during the very first inference based on its input shape.
self.fc1 = nn.LazyLinear(120)
