Pytorch no gradient calculated one for variable - pytorch

I defined 2 variables (D, ht) to received gradients. However after the loss function and backward() calculation, only D has gradients, ht does not have any gradients calculated. Can you please help me to understand why it is the case? thank you.
import torch.nn as nn
import torch
from torch.autograd import Variable, Function
x = Variable(torch.from_numpy(np.random.normal(0,1,(10,10))), requires_grad=False) # original
# depends on size of the dictionary, number of atoms.
D = Variable(torch.from_numpy(np.random.normal(0,1,(500,10,10))), requires_grad=True)
# hx sparse representation
ht = Variable(torch.from_numpy(np.random.normal(0,1,(500,1,1))), requires_grad=True)
ld = 0.02
lr = 0.001 # learning rate
# optimizer
optimizer = torch.optim.SGD([ht,D], lr=lr, momentum=0.9)
optimizer.zero_grad()
loss_ht = 0.5*torch.norm((x-(D*ht).sum(dim=0)),p=2)**2
loss_ht.backward() # back propogation
optimizer.step() # update parameters
D has graidents, see below
D.grad.data
( 0 ,.,.) =
2.0307e+01 9.9208e+00 4.9194e+00 … -2.3104e+01 -2.6616e+01 -9.8070e-01
3.7742e+01 2.5255e+01 4.5286e+00 … -3.5697e+01 -1.1306e+01 1.9366e+01
ht does not have any gradients. but why? please help. see below.
ht.grad.data
AttributeError Traceback (most recent call last)
in ()
----> 1 ht.grad.data
AttributeError: ‘NoneType’ object has no attribute ‘data’

Related

Bypass keras dimension check for custom loss

I am trying to implement my custom loss function but I am having difficulties making it work with Keras.
I am dealing with n-dimensional data, and I would like for each input sample to compute a loss based on a another n-dimensional vector, we'll call it P (it encodes the reliability of each corresponding input measures, to put it simply).
My model is an autoencoder with n inputs (X) and n outputs (Y). I started implementing a solution in which I would overload my Y matrix by appending it P : new_Y = concat([Y, P]), and then in my custom loss split this matrix back and compute my loss. Problem is, tensorflow seems to perform a shape check of the output of my model with my new_Y, before even looking at my loss function.
Code to reproduce the error :
import numpy as np
from tensorflow import keras
import keras.backend as K
from keras import Model
from keras.layers import Input, Dense
from keras.optimizers import Adam
from keras.losses import MeanSquaredError as TFMSE
tf_mse = TFMSE()
input_dim = 2
batch_size = 16
X = np.random.randn(input_dim * batch_size).reshape((batch_size, input_dim, 1))
P = np.random.randn(input_dim * batch_size).reshape((batch_size, input_dim, 1))
Y = X.copy()
new_Y = np.hstack([Y, P])
_input = Input((input_dim, 1), name='input')
elayer = Dense(input_dim, kernel_initializer='normal', activation='relu')(_input)
bottleneck = Dense(1, kernel_initializer='normal', activation='linear')(elayer)
dlayer = Dense(input_dim, kernel_initializer='normal', activation='relu')(bottleneck)
model = Model(inputs=_input, outputs=dlayer)
def adjusted_mse(y_true, y_pred):
# I am first trying to simply apply classic MSE on Y here :
# y_true = [Y, P], so y_true[:, :input_dim] = Y
return tf_mse(y_true[:, :input_dim], y_pred)
model.compile(
loss=adjusted_mse,
optimizer=Adam(),
metrics=['mse']
)
model.fit(X, new_Y, epochs=8, batch_size=4, validation_split=.1, verbose=True)
Error raised :
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "***/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/tmp/__autograph_generated_filew2svrbng.py", line 15, in tf__train_function
retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
ValueError: in user code:
...
ValueError: Dimensions must be equal, but are 2 and 4 for '{{node SquaredDifference}} = SquaredDifference[T=DT_FLOAT](model/dense_2/Relu, IteratorGetNext:1)' with input shapes: [?,2,2], [?,4,1].
Anyone have any idea how I could at least bypass this verification step ? Or is there a simpler way to do that ? I was thinking about overloading X the same way and passing the P part through an identity layer then concatenate it back, but that is twisted..
Thank you

Results from Pytorch tutorial using Google collab not matching results in PyCharm

I'm following a tutorial on Youtube for Pytorch which uses torch.manual_seed to ensure results are the same or least in the same ballpark.
Now admittedly I'm no expert but on running the code in chapter 2 of the tutorial the resulting graph from my model seems way off from what it should be.
I've tried going through the code line by line but for the last 3 days I can't seem to find any differences between my code and the code used in the tutorial (other than variable names which I changed for clarity on my part and so I'm not just mindlessly copying).
I work a pretty busy menial job with variable work days so I don't get a lot of time off but I've spent 10 'off days' across a month trying to solve this and I just can't see it. Genuinely any help would be appreciated even if it's an eror on my part I would be alright with that being stated without saying what it is; I just want to know if I've done anything wrong at all.
Here's a link to the doc file for the tutorial if that helps:
https://www.learnpytorch.io/02_pytorch_classification/#31-going-from-raw-model-outputs-to-predicted-labels-logits-prediction-probabilities-prediction-labels
Here's my code:
`from sklearn.datasets import make_circles
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split as tts
import torch
from torch import nn
from helper_functions import plot_predictions, plot_decision_boundary
import os
from pathlib import Path
# Generate 1000 samples
sample_number = 1000
# Create circles
Features, labels = make_circles(sample_number,
noise=0.03, # <- adds a little noise to the dots
random_state=42) # <- sets random state seed for consistency
# View the first 5 values for both parameters
print(f'First 5 Features (X):\n{Features[:5]}\n'
f'First 5 labels (y):\n{labels[:5]}')
# Make a DataFrame of circle data
circles = pd.DataFrame({"inputType1": Features[:, 0], # <- everything in the 0th index is type 1
"inputType2": Features[:, 1], # <- everything in the 1st index is type 2
"output": labels
})
print(f'Created dataframe:\n'
f'{circles.head(10)}')
# Check the different labels
print(f'Number of value per class:\n'
f'{circles.output.value_counts()}')
# Visualise the dataframe
plt.scatter(x=Features[:, 0],
y=Features[:, 1],
c=labels,
cmap=plt.cm.RdYlBu)
# Display plot
# plt.show()
# Check the shapes of the features and labels
# ML deals with numerical representation
# Ensuring the input and output shapes are compatible is crucial
print(f'Circle shapes: {Features.shape, labels.shape}')
# View the first example of features and labels
Features_samples = Features[0]
labels_samples = labels[0]
print(f'Values for one sample of X: {Features_samples} and the same for y: {labels_samples}')
print(f'Shapes for one sample of X: {Features_samples.shape}'
f'\nand the same for y: {labels_samples.shape}')
# ^ Features dataset has 1000 samples with two feature classes
# ^ labels dataset has 1000 samples with no feature classes since it's a scalar
# Turning datasets into tensors
Features = torch.from_numpy(Features).type(torch.float)
labels = torch.from_numpy(labels).type(torch.float)
# View the first five samples
print(f'First 5 Features:\n'
f'{Features[:5]}\n'
f'First 5 labels:\n'
f'{labels[:5]}\n')
# Split data into train and test sets
input_data_train, input_data_test, model_output_train, model_output_test = tts(Features,
labels,
test_size=0.2,
random_state=42)
# Check that splits follow this pattern:
# 80% train, 20% test
print(f'Number of samples for input train:\n'
f'{len(input_data_train)}\n'
f'Number of samples for input test:\n'
f'{len(input_data_test)}\n'
f'Number of samples for output train:\n'
f'{len(model_output_train)}\n'
f'Number of samples for output test:\n'
f'{len(model_output_test)}\n')
# Begin building learning model
# Make device-agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f'Learning model processing on: {device}\n')
"""# Assign parameters to the device
input_data_train = input_data_train.to(device)
input_data_test = input_data_test.to(device)
model_output_train = model_output_train.to(device)
model_output_test = model_output_test.to(device)"""
# 1. Construct a model class that subclasses nn.Module
class CircleClassificationModel(nn.Module):
def __init__(self):
super().__init__()
# 2. Create 2 nn.Linear layers for handling Feature and labels, input and output shapes
self.layer_1 = nn.Linear(in_features=2, out_features=5) # <- inputs 2 Features, outputs 5
self.layer_2 = nn.Linear(in_features=5, out_features=1) # <- inputs 5 Features, outputs 1 label
# 3. Define a forward method containing the forward pass computation
def forward(self, x):
# Return the output of layer_2, a single feature, which is the same shape as the label
return self.layer_2(self.layer_1(x))
# Computation goes through layer_1 first
# The output of layer_1 goes through layer_2
# 4. Create an instance of the model and send it to target device
classification_model_0 = CircleClassificationModel().to(device)
# Display model parameters
print(f'Model parameters for self defined model:\n'
f'{classification_model_0}\n')
# The above code can be written more succinctly using nn.Sequential
# Implements two layers of nn.Linear()
# Which calls the following equation
# y = ( x * (Weights).transposed ) + bias
classification_model_0 = nn.Sequential(
nn.Linear(in_features=2, out_features=5),
nn.Linear(in_features=5, out_features=1)
).to(device)
# Display model parameters
print(f'Model (nn.Sequential) parameters:\n'
f'{classification_model_0}\n\n')
# Make predictions with the model
untrained_predictions = classification_model_0(input_data_test.to(device))
print(f'Length of predictions: {len(untrained_predictions)}, Shape: {untrained_predictions.shape}')
print(f'Length of test samples: {len(model_output_test)}, Shape: {model_output_test.shape}')
print(f'\nFirst 10 predictions:\n'
f'{untrained_predictions[:10]}')
print(f'\nFirst 10 test labels:\n'
f'{model_output_test[:10]}')
# Create a loss function
# Unlike the regression model, the classification model uses a different loss type
# Binary Cross Entropy will be used for this task
# torch.nn.BCELoss() - measures the BCE between the target(labels) and the input(features)
# Another version may be used:
# torch.nn.BCEWithLogitsLoss() - same, except it has a built-in Sigmoid function
# loss_fn = nn.BCELoss() # <- BCELoss = no sigmoid built-in
loss_function = nn.BCEWithLogitsLoss() # <- BCEWithLogitsLoss = sigmoid built-in
# Create an optimiser
optimiser = torch.optim.SGD(params=classification_model_0.parameters(),
lr=0.1)
# Calculate accuracy (a classification metric)
# This acts as an evaluation metric
# Offers perspective into how the model is going
# The loss function measures how wrong the model but
# Evaluation metrics measure how right it is
# Accuracy will be the first metric to be utilised
# Accuracy can be measured by dividing the total number of correct predictions
# By the total number of overall predictions
def accuracy_function(label_actual, label_predicted):
# Calculates where 2 tensors are equal
correct = torch.eq(label_actual, label_predicted).sum().item()
accuracy = (correct / len(label_predicted)) * 100
return accuracy
# View the first 5 results of the forward pass on test data
# labels_logits represents the output of the forward pass method above
# Which utilises two nn.Linear() layers
labels_logits = classification_model_0(input_data_test.to(device))[:5]
print(f'First 5 outputs of the forward pass:\n'
f'{labels_logits}')
# Use the sigmoid function on the model labels_logits
# Turns the output of the forward pass into prediction probabilities
# Measures the odds the model classifies a data point into one class or the other
# In the case of this problem the classes are either 0 or 1
# It uses the logic:
# If labels_prediction_probabilities >= 0.5 then assign the label class (1)
# If labels_prediction_probabilities < 0.5 then assign the label class (0)
labels_prediction_probabilities = torch.sigmoid(labels_logits)
print(f'Output of the sigmoid-ed forward pass:\n'
f'{labels_prediction_probabilities}')
# Find the predicted labels (round the prediction probabilities as well)
label_predictions = torch.round(labels_prediction_probabilities)
# In full
labels_predictions_classes = \
torch.round(torch.sigmoid(classification_model_0(input_data_test.to(device))[:5]))
# Check for equality
print(torch.eq(label_predictions.squeeze(), labels_predictions_classes.squeeze()))
# Get rid of the extra dimensions
label_predictions.squeeze()
# Display model predictions
print(f'Model Predictions:\n'
f'{label_predictions}')
# Display test labels for comparison with model predictions
print(f'\nFirst five test data:\n'
f'{model_output_test[:5]}')
# Building the training loop
torch.manual_seed(42)
# Set the number of epochs
epochs = 100
# Process data on the target devices
input_data_train, model_output_train = input_data_train.to(device),\
model_output_train.to(device)
input_data_test, model_output_test = input_data_test.to(device),\
model_output_test.to(device)
# Build the training and evaluation loop
for epoch in range(epochs):
# Training
classification_model_0.train()
# todo: Do the Forward pass
# Model outputs raw labels_logits
train_labels_logits = classification_model_0(input_data_train).squeeze()
# ^ .squeeze() removes the extra dimensions, won't work if model and data on diff devices
train_label_prediction = torch.round(torch.sigmoid(train_labels_logits))
# ^ turns logits -> prediction probabilities -> prediction label classes
# todo: Calculate the loss
# 2. Calculates loss/accuracy
""" train_loss = loss_function(torch.sigmoid(train_labels_logits),
model_output_train) # <- nn.BCELoss needs torch.sigmoid() """
train_loss = loss_function(train_labels_logits,
model_output_train)
train_accuracy = accuracy_function(label_actual=model_output_train,
label_predicted=train_label_prediction)
# todo: Optimiser zero grad
optimiser.zero_grad()
# todo: Loss backwards
train_loss.backward()
# todo: optimiser step step step
optimiser.step()
# Testing
# todo: evaluate the model
classification_model_0.eval()
with torch.inference_mode():
# todo: Do the forward pass
test_logits = classification_model_0(input_data_test).squeeze()
test_predictions = torch.round(torch.sigmoid(test_logits))
# todo: calculate the loss
test_loss = loss_function(test_logits,
model_output_test)
test_accuracy = accuracy_function(label_actual=model_output_test,
label_predicted=test_predictions)
# todo: print model statistics every 10 epochs
if epoch % 10 == 0:
print(f'Epoch: {epoch} | Loss: {train_loss:.5f}, | Train Accuracy: {train_accuracy:.2f}%'
f'Test Loss: {test_loss:.5f}, | Test accuracy: {test_accuracy:.2f}%')
# Plot decision boundary for training and test sets
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("My Train")
plot_decision_boundary(classification_model_0, input_data_train, model_output_train)
plt.subplot(1, 2, 2)
plt.title("My Test")
plot_decision_boundary(classification_model_0, input_data_test, model_output_test)
plt.show()`
AND HERE'S THE TUTORIAL CODE:
from sklearn.datasets import make_circles
# Make 1000 samples
n_samples = 1000
# Create circles
X, y = make_circles(n_samples,
noise=0.03, # a little bit of noise to the dots
random_state=42) # keep random state so we get the same values
print(f"First 5 X features:\n{X[:5]}")
print(f"\nFirst 5 y labels:\n{y[:5]}")
# Make DataFrame of circle data
import pandas as pd
circles = pd.DataFrame({"X1": X[:, 0],
"X2": X[:, 1],
"label": y
})
circles.head(10)
# Check different labels
circles.label.value_counts()
# Visualize with a plot
import matplotlib.pyplot as plt
plt.scatter(x=X[:, 0],
y=X[:, 1],
c=y,
cmap=plt.cm.RdYlBu);
# Check the shapes of our features and labels
X.shape, y.shape
# View the first example of features and labels
X_sample = X[0]
y_sample = y[0]
print(f"Values for one sample of X: {X_sample} and the same for y: {y_sample}")
print(f"Shapes for one sample of X: {X_sample.shape} and the same for y: {y_sample.shape}")
# Turn data into tensors
# Otherwise this causes issues with computations later on
import torch
X = torch.from_numpy(X).type(torch.float)
y = torch.from_numpy(y).type(torch.float)
# View the first five samples
X[:5], y[:5]
# Split data into train and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,
y,
test_size=0.2, # 20% test, 80% train
random_state=42) # make the random split reproducible
len(X_train), len(X_test), len(y_train), len(y_test)
# Standard PyTorch imports
import torch
from torch import nn
# Make device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device
# 1. Construct a model class that subclasses nn.Module
class CircleModelV0(nn.Module):
def __init__(self):
super().__init__()
# 2. Create 2 nn.Linear layers capable of handling X and y input and output shapes
self.layer_1 = nn.Linear(in_features=2, out_features=5) # takes in 2 features (X), produces 5 features
self.layer_2 = nn.Linear(in_features=5, out_features=1) # takes in 5 features, produces 1 feature (y)
# 3. Define a forward method containing the forward pass computation
def forward(self, x):
# Return the output of layer_2, a single feature, the same shape as y
return self.layer_2(
self.layer_1(x)) # computation goes through layer_1 first then the output of layer_1 goes through layer_2
# 4. Create an instance of the model and send it to target device
model_0 = CircleModelV0().to(device)
model_0
# Replicate CircleModelV0 with nn.Sequential
model_0 = nn.Sequential(
nn.Linear(in_features=2, out_features=5),
nn.Linear(in_features=5, out_features=1)
).to(device)
model_0
# Make predictions with the model
untrained_preds = model_0(X_test.to(device))
print(f"Length of predictions: {len(untrained_preds)}, Shape: {untrained_preds.shape}")
print(f"Length of test samples: {len(y_test)}, Shape: {y_test.shape}")
print(f"\nFirst 10 predictions:\n{untrained_preds[:10]}")
print(f"\nFirst 10 test labels:\n{y_test[:10]}")
# Create a loss function
# loss_fn = nn.BCELoss() # BCELoss = no sigmoid built-in
loss_fn = nn.BCEWithLogitsLoss() # BCEWithLogitsLoss = sigmoid built-in
# Create an optimizer
optimizer = torch.optim.SGD(params=model_0.parameters(),
lr=0.1)
# Calculate accuracy (a classification metric)
def accuracy_fn(y_true, y_pred):
correct = torch.eq(y_true, y_pred).sum().item() # torch.eq() calculates where two tensors are equal
acc = (correct / len(y_pred)) * 100
return acc
# View the frist 5 outputs of the forward pass on the test data
y_logits = model_0(X_test.to(device))[:5]
y_logits
# Use sigmoid on model logits
y_pred_probs = torch.sigmoid(y_logits)
y_pred_probs
# Find the predicted labels (round the prediction probabilities)
y_preds = torch.round(y_pred_probs)
# In full
y_pred_labels = torch.round(torch.sigmoid(model_0(X_test.to(device))[:5]))
# Check for equality
print(torch.eq(y_preds.squeeze(), y_pred_labels.squeeze()))
# Get rid of extra dimension
y_preds.squeeze()
y_test[:5]
torch.manual_seed(42)
# Set the number of epochs
epochs = 100
# Put data to target device
X_train, y_train = X_train.to(device), y_train.to(device)
X_test, y_test = X_test.to(device), y_test.to(device)
# Build training and evaluation loop
for epoch in range(epochs):
### Training
model_0.train()
# 1. Forward pass (model outputs raw logits)
y_logits = model_0(
X_train).squeeze() # squeeze to remove extra `1` dimensions, this won't work unless model and data are on same device
y_pred = torch.round(torch.sigmoid(y_logits)) # turn logits -> pred probs -> pred labls
# 2. Calculate loss/accuracy
# loss = loss_fn(torch.sigmoid(y_logits), # Using nn.BCELoss you need torch.sigmoid()
# y_train)
loss = loss_fn(y_logits, # Using nn.BCEWithLogitsLoss works with raw logits
y_train)
acc = accuracy_fn(y_true=y_train,
y_pred=y_pred)
# 3. Optimizer zero grad
optimizer.zero_grad()
# 4. Loss backwards
loss.backward()
# 5. Optimizer step
optimizer.step()
### Testing
model_0.eval()
with torch.inference_mode():
# 1. Forward pass
test_logits = model_0(X_test).squeeze()
test_pred = torch.round(torch.sigmoid(test_logits))
# 2. Caculate loss/accuracy
test_loss = loss_fn(test_logits,
y_test)
test_acc = accuracy_fn(y_true=y_test,
y_pred=test_pred)
# Print out what's happening every 10 epochs
if epoch % 10 == 0:
print(
f"Epoch: {epoch} | Loss: {loss:.5f}, Accuracy: {acc:.2f}% | Test loss: {test_loss:.5f}, Test acc: {test_acc:.2f}%")
from helper_functions import plot_predictions, plot_decision_boundary
# Plot decision boundaries for training and test sets
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("Tut Train")
plot_decision_boundary(model_0, X_train, y_train)
plt.subplot(1, 2, 2)
plt.title("Tut Test")
plot_decision_boundary(model_0, X_test, y_test)

The gradient cannot be calculated automatically

I am a beginner of Deep Learning and trying to making discriminator that judge cats/non-cats.
But When I run the code following, runtime error occured.
I know that "requires_grad" must be set to True in order to calculate the gradient automatically, but since X_train and Y_train are variables for reading, they are set to False.
I would be grateful if you could modify this code.
X_train = torch.tensor(train_set_x, dtype=dtype,requires_grad=False)
Y_train = torch.tensor(train_set_y, dtype=dtype,requires_grad=False)
def train_model(X_train, Y_train, X_test, Y_test, n_h, num_iterations=10000,learning_rate=0.5, print_cost=False):
"""
Arguments:
X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
n_h -- size of the hidden layer
num_iterations -- number of iterations in gradient descent loop
learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
print_cost -- if True, print the cost every 200 iterations
Returns:
d -- dictionary containing information about the model.
"""
n_x = X.size(1)
n_y = Y.size(1)
# Create model
model = nn.Sequential(
nn.Linear(n_x,n_h),
nn.ReLU(),
nn.Linear(n_h,n_y),
nn.ReLU()
)
# Initialize parameters
for name, param in model.named_parameters():
if name.find('weight') != -1:
torch.nn.init.orthogonal_(param)
elif name.find('bias') != -1:
torch.nn.init.constant_(param, 0)
# Cost function
cost_fn = nn.BCELoss()
# Loop (gradient descent)
for i in range(0, num_iterations):
# Forward propagation: compute predicted labels by passing input data to the model.
Y_predicted = model(X_train)
A2 = (Y_predicted > 0.5).float()
# Cost function. Inputs: predicted and true values. Outputs: "cost".
cost = cost_fn(A2, Y_train)
# Print the cost every 100 iterations
if print_cost and i % 100 == 0:
print("Cost after iteration %i: %f" % (i, cost.item()))
# Zero the gradients before running the backward pass. See hint in problem description
model.zero_grad()
# Backpropagation. Compute gradient of the cost function with respect to all the
# learnable parameters of the model. Use autograd to compute the backward pass.
cost.backward()
# Gradient descent parameter update.
with torch.no_grad():
for param in model.parameters():
# Your code here !!
param -= learning_rate * param.grad
d = {"model": model,
"learning_rate": learning_rate,
"num_iterations": num_iterations}
return d
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I believe your problem is that you are mixing numpy arrays and torch tensors. Pytorch tensors are a bit like numpy arrays, but they also kept in a computational graph that is responsible for the backward pass.
The description of your received variables X_train, Y_train, X_test, Y_test says they are numpy arrays. You should convert them all to torch tensors:
x = torch.tensor(x)
I also noticed that you are manually performing gradient updates. Unless that was your intention, I would recomend you using one of pytorch's optimizers.
from torch.optim import SGD
model = nn.Sequential(
nn.Linear(n_x,n_h),
nn.ReLU(),
nn.Linear(n_h,n_y),
nn.Sigmoid() # You are using BCELoss, you should give it an input from 0 to 1
)
optimizer = SGD(model.parameters(), lr=learning_rate)
cost_fn = nn.BCELoss()
optimizer.zero_grad()
y = model(x)
cost = cost_fn(y, target)
cost.backward()
optimizer.step() # << updated the gradients of your model
Notice that it is recomended to use torch.nn.BCEWithLogitsLoss instead of BCELoss. The first implements sigmoid and the binary cross entropy together with some math tricks to make it more numerically stable. Your model should look something like:
model = nn.Sequential(
nn.Linear(n_x,n_h),
nn.ReLU(),
nn.Linear(n_h,n_y)
)

Error when fiting linear binary classifier with tensorflow ValueError: No gradients provided for any variable, check your graph

I have an error when trying to fit a linear binary classifier using step function and MSE, instead of softmax and cross-entropy loss. I have and error which I can't overcome probably due to shape inconsistencies. I provide a code sample. Please help
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification as gen_data
from sklearn.model_selection import train_test_split
rng = np.random
# Setting hyperparameters
n_observations = 100
lr = 0.005
n_iter = 100
# Generate input data
xs, ys = gen_data(n_features=2, n_redundant=0, n_informative=2,
random_state=0, n_clusters_per_class=1)
# Split data into train and test
X_train, X_test, y_train, y_test = train_test_split(xs, ys, test_size=.4)
X_train = np.float32(X_train)
X_test = np.float32(X_test)
# Graph
X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)
W = tf.Variable(np.float32(rng.randn(2)), name="weight")
b = tf.Variable(np.float32(rng.randn()), name="bias")
def step(x):
is_greater = tf.greater(x, 0)
as_float = tf.to_float(is_greater)
doubled = tf.multiply(as_float, 2)
return tf.subtract(doubled, 1)
Y_pred = step(tf.add(tf.multiply(X , W), b))
cost = tf.reduce_mean(tf.squared_difference(Y_pred, Y))
# Using built-in optimization algorithm to train the model:
train_step = tf.train.GradientDescentOptimizer(0.005).minimize(cost)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
for step in range(n_iter):
sess.run(train_step, feed_dict={X:X_train, Y:y_train})
print ("iter: {0}; weight: {1}; bias: {2}".format(step,
sess.run(W),
sess.run(b)))
This is the error:
ValueErrorTraceback (most recent call last)
<ipython-input-17-5a0c4711802c> in <module>()
26
27 # Using built-in optimization algorithm to train the model:
---> 28 train_step = tf.train.GradientDescentOptimizer(0.005).minimize(cost)
29
30 # Using TF differentiation from scratch to implement a step-by-step optimizer
/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.pyc in minimize(self, loss, global_step, var_list, gate_gradients, aggregation_method, colocate_gradients_with_ops, name, grad_loss)
405 "No gradients provided for any variable, check your graph for ops"
406 " that do not support gradients, between variables %s and loss %s." %
--> 407 ([str(v) for _, v in grads_and_vars], loss))
408
409 return self.apply_gradients(grads_and_vars, global_step=global_step,
ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'weight:0' shape=(2,) dtype=float64_ref>", "<tf.Variable 'bias:0' shape=() dtype=float32_ref>", "<tf.Variable 'weight_1:0' shape=(2,) dtype=float64_ref>", "<tf.Variable 'bias_1:0' shape=() dtype=float32_ref>",
Your training data isn't changing between training steps. That is, each training step feeds the same values for X and Y:
for step in range(n_iter):
sess.run(train_step, feed_dict={X:X_train, Y:y_train})
If you set different values for X and Y between training steps, the error should go away.

Multi-variable linear regression using Tensorflow

I am trying to implement multi-varibale linear regression using tensorflow. I have a csv file with 200 rows and 3 columns (features) with the last column as output. Something like this:
I have written the following code:
from __future__ import print_function
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import csv
import pandas
rng = np.random
# Parameters
learning_rate = 0.01
training_epochs = 1000
display_step = 50
I get the data from the file using pandas and store it:
# Training Data
dataframe = pandas.read_csv("Advertising.csv", delim_whitespace=True, header=None)
dataset = dataframe.values
X1,X2,X3,y1 = [],[],[],[]
for i in range(1,len(dataset)):
X = dataset[i][0]
X1.append(np.float32(X.split(",")[1]))
X2.append(np.float32(X.split(",")[2]))
X3.append(np.float32(X.split(",")[3]))
y1.append(np.float32(X.split(",")[4]))
X = np.column_stack((X1,X2))
X = np.column_stack((X,X3))
I assign the placeholders and variables and the linear regression model:
n_samples = len(X1)
#print(n_samples) = 17
# tf Graph Input
X_1 = tf.placeholder(tf.float32, [3, None])
Y = tf.placeholder(tf.float32, [None])
# Set model weights
W1 = tf.Variable(rng.randn(), [n_samples,3])
b = tf.Variable(rng.randn(), [n_samples])
# Construct a linear model
pred = tf.add(tf.matmul(W1, X_1), b)
# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Fit all training data
for epoch in range(training_epochs):
for (x1, y) in zip(X, y1):
sess.run(optimizer, feed_dict={X_1: x1, Y: y})
# Display logs per epoch step
if (epoch+1) % display_step == 0:
c = sess.run(cost, feed_dict={X_1: x1, Y: y})
print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
"Weights=", sess.run(W1),"b=", sess.run(b))
I get the following error which I am not able to debug:
ValueError: Shape must be rank 2 but is rank 0 for 'MatMul' (op:
'MatMul') with input shapes: [], [3,?].
Can you help me with hot to solve this?
Thanks in advance.
tf.variable doesn't take inputs as you are thinking, the second parameter is not shape. To set the shape of the variable you do this with the initializer (the first parameter). see https://www.tensorflow.org/api_docs/python/tf/Variable
Your code
# Set model weights
W1 = tf.Variable(rng.randn(), [n_samples,3])
b = tf.Variable(rng.randn(), [n_samples])
My suggested change
initial1 = tf.constant(rng.randn(), dtype=tf.float32, shape=[n_samples,3])
initial2 = tf.constant(rng.randn(), dtype=tf.float32, shape=[n_samples,3])
W1 = tf.Variable(initial_value=initial1)
b = tf.Variable(initial_value=initial2)
In answer to the additional issues which arise after fixing the initial question the following code runs - but there still might be some logical error which you need to think about - like your #display logs per epoch step.
from __future__ import print_function
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import csv
import pandas
rng = np.random
# Parameters
learning_rate = 0.01
training_epochs = 1000
display_step = 50
# Training Data
#Created some fake data
dataframe = [[230.1,37.8,69.2,22.1],[2230.1,32.8,61.2,21.1]] #pandas.read_csv("Advertising.csv", delim_whitespace=True, header=None)
dataset = dataframe
X1,X2,X3,y1 = [],[],[],[]
for i in range(0,len(dataset)):
X = dataset[i][0]
X1.append(np.float32(dataset[i][0]))
X2.append(np.float32(dataset[i][1]))
X3.append(np.float32(dataset[i][2]))
y1.append(np.float32(dataset[i][3]))
#X=np.array([X1,X2,X3])
X = np.column_stack((X1,X2,X3)) ##MYEDIT: This combines all three values. If you find you need to stack in a different way then you will need to ensure the shapes below match this shape.
#X = np.column_stack((X,X3))
n_samples = len(X1)
#print(n_samples) = 17
# tf Graph Input
X_1 = tf.placeholder(tf.float32, [ None,3])##MYEDIT: Changed order
Y = tf.placeholder(tf.float32, [None])
# Set model weights
initial1 = tf.constant(rng.randn(), dtype=tf.float32, shape=[3,1]) ###MYEDIT: change order and you are only giving 1 sample at a time with your method of calling
initial2 = tf.constant(rng.randn(), dtype=tf.float32, shape=[3,1])
W1 = tf.Variable(initial_value=initial1)
b = tf.Variable(initial_value=initial2)
mul=tf.matmul(W1, X_1) ##MYEDIT: remove matmul from pred for clarity and shape checking
# Construct a linear model
pred = tf.add(mul, b)
# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Fit all training data
for epoch in range(training_epochs):
for (x1, y) in zip(X, y1):
Xformatted=np.array([x1]) #has shape (1,3) #MYEDIT: separated this to demonstrate shapes
yformatted=np.array([y]) #shape (1,) #MYEDIT: separated this to demonstrate shapes
#NB. X_1 shape is (?,3) and Y shape is (?,)
sess.run(optimizer, feed_dict={X_1: Xformatted, Y: yformatted})
# Display logs per epoch step
if (epoch+1) % display_step == 0:
c = sess.run(cost, feed_dict={X_1: Xformatted, Y: yformatted}) #NB. x1 an y are out of scope here - you will only get the last values. Double check if this is what you meant.
print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
"Weights=", sess.run(W1),"b=", sess.run(b))
You need to feed a matrix into tf.matmul(W1, X_1). Check the types for your W1 and X_1 for your code.
See the question here for more details

Resources