Doc-Classification (Pytorch, Bert), how to change the training/validation loop to work for multilabel case - pytorch

I am trying to make BertForSequenceClassification.from_pretrained() work for multilabel. Since the code I found online is for binary label case.
I have document classification with 12 labels. Using Bert Language model as pytorch model.
what should I do to make it work for multilabel. I get this error, when I run it initially without changing the train/val loop
ValueError: Target size (torch.Size([32])) must be the same as input size (torch.Size([32, 12]))
I assume I have to change the input since the target is [32,12]. But how to do this?
Edit: Full output
======== Epoch 1 / 4 ========
Training...
torch.Size([32, 64])
tensor([[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
...,
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1]], device='cuda:0')
tensor([ 9., 9., 3., 8., 9., 10., 4., 3., 4., 4., 9., 0., 9., 9.,
11., 3., 9., 9., 3., 4., 4., 7., 8., 9., 10., 6., 4., 0.,
10., 3., 4., 1.], dtype=torch.float64)
ValueError Traceback (most recent call last)
<ipython-input-25-ac7a3b802ac2> in <module>
90 # Specifically, we'll get the loss (because we provided labels) and the
91 # "logits"--the model outputs prior to activation.
---> 92 result = model(b_input_ids,
93 token_type_ids=None,
94 attention_mask=b_input_mask, 4 frames
/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py in binary_cross_entropy_with_logits(input, target, weight, size_average, reduce, reduction, pos_weight)
3158 3159 if not (target.size() == input.size()):
-> 3160 raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size())) 3161 3162 return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
ValueError: Target size (torch.Size([32])) must be the same as input size (torch.Size([32, 12]))
the code:
from transformers import BertForSequenceClassification, AdamW, BertConfig
# Load BertForSequenceClassification, the pretrained BERT model with a single
# linear classification layer on top.
model = BertForSequenceClassification.from_pretrained(
"bert-base-uncased", # Use the 12-layer BERT model, with an uncased vocab.
num_labels = 2, # The number of output labels--2 for binary classification.
# You can increase this for multi-class tasks.
output_attentions = False, # Whether the model returns attentions weights.
output_hidden_states = False, # Whether the model returns all hidden-states.
)
# Tell pytorch to run this model on the GPU.
model.cuda()
optimizer = AdamW(model.parameters(),
lr = 2e-5, # args.learning_rate - default is 5e-5, our notebook had 2e-5
eps = 1e-8 # args.adam_epsilon - default is 1e-8.
)
from transformers import get_linear_schedule_with_warmup
total_steps = len(train_dataloader) * epochs
# Create the learning rate scheduler.
scheduler = get_linear_schedule_with_warmup(optimizer,
num_warmup_steps = 0, # Default value in run_glue.py
num_training_steps = total_steps)
import random
import numpy as np
# This training code is based on the `run_glue.py` script here:
# https://github.com/huggingface/transformer/blob/5bfcd0485ece086ebcbed2d008813037968a9e58/examples/run_glue.py#L128
# Set the seed value all over the place to make this reproducible.
seed_val = 42
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)
# We'll store a number of quantities such as training and validation loss,
# validation accuracy, and timings.
training_stats = []
# Measure the total training time for the whole run.
total_t0 = time.time()
# For each epoch...
for epoch_i in range(0, epochs):
# ========================================
# Training
# ========================================
# Perform one full pass over the training set.
print("")
print('======== Epoch {:} / {:} ========'.format(epoch_i + 1, epochs))
print('Training...')
# Measure how long the training epoch takes.
t0 = time.time()
# Reset the total loss for this epoch.
total_train_loss = 0
# Put the model into training mode. Don't be mislead--the call to
# `train` just changes the *mode*, it doesn't *perform* the training.
# `dropout` and `batchnorm` layers behave differently during training
# vs. test (source: https://stackoverflow.com/questions/51433378/what-does-model-train-do-in-pytorch)
model.train()
# For each batch of training data...
for step, batch in enumerate(train_dataloader):
# Progress update every 40 batches.
if step % 40 == 0 and not step == 0:
# Calculate elapsed time in minutes.
elapsed = format_time(time.time() - t0)
# Report progress.
print(' Batch {:>5,} of {:>5,}. Elapsed: {:}.'.format(step, len(train_dataloader), elapsed))
# Unpack this training batch from our dataloader.
#
# As we unpack the batch, we'll also copy each tensor to the GPU using the
# `to` method.
#
# `batch` contains three pytorch tensors:
# [0]: input ids
# [1]: attention masks
# [2]: labels
b_input_ids = batch[0].to(device)
b_input_mask = batch[1].to(device)
b_labels = batch[2].to(device)
# Always clear any previously calculated gradients before performing a
# backward pass. PyTorch doesn't do this automatically because
# accumulating the gradients is "convenient while training RNNs".
# (source: https://stackoverflow.com/questions/48001598/why-do-we-need-to-call-zero-grad-in-pytorch)
model.zero_grad()
# Perform a forward pass (evaluate the model on this training batch).
# In PyTorch, calling `model` will in turn call the model's `forward`
# function and pass down the arguments. The `forward` function is
# documented here:
# https://huggingface.co/transformers/model_doc/bert.html#bertforsequenceclassification
# The results are returned in a results object, documented here:
# https://huggingface.co/transformers/main_classes/output.html#transformers.modeling_outputs.SequenceClassifierOutput
# Specifically, we'll get the loss (because we provided labels) and the
# "logits"--the model outputs prior to activation.
result = model(b_input_ids,
token_type_ids=None,
attention_mask=b_input_mask,
labels=b_labels,
return_dict=True)
loss = result.loss
logits = result.logits
# Accumulate the training loss over all of the batches so that we can
# calculate the average loss at the end. `loss` is a Tensor containing a
# single value; the `.item()` function just returns the Python value
# from the tensor.
total_train_loss += loss.item()
# Perform a backward pass to calculate the gradients.
loss.backward()
# Clip the norm of the gradients to 1.0.
# This is to help prevent the "exploding gradients" problem.
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# Update parameters and take a step using the computed gradient.
# The optimizer dictates the "update rule"--how the parameters are
# modified based on their gradients, the learning rate, etc.
optimizer.step()
# Update the learning rate.
scheduler.step()
# Calculate the average loss over all of the batches.
avg_train_loss = total_train_loss / len(train_dataloader)
# Measure how long this epoch took.
training_time = format_time(time.time() - t0)
print("")
print(" Average training loss: {0:.2f}".format(avg_train_loss))
print(" Training epcoh took: {:}".format(training_time))
# ========================================
# Validation
# ========================================
# After the completion of each training epoch, measure our performance on
# our validation set.
print("")
print("Running Validation...")
t0 = time.time()
# Put the model in evaluation mode--the dropout layers behave differently
# during evaluation.
model.eval()
# Tracking variables
total_eval_accuracy = 0
total_eval_loss = 0
nb_eval_steps = 0
# Evaluate data for one epoch
for batch in validation_dataloader:
# Unpack this training batch from our dataloader.
#
# As we unpack the batch, we'll also copy each tensor to the GPU using
# the `to` method.
#
# `batch` contains three pytorch tensors:
# [0]: input ids
# [1]: attention masks
# [2]: labels
b_input_ids = batch[0].to(device)
b_input_mask = batch[1].to(device)
b_labels = batch[2].to(device)
# Tell pytorch not to bother with constructing the compute graph during
# the forward pass, since this is only needed for backprop (training).
with torch.no_grad():
# Forward pass, calculate logit predictions.
# token_type_ids is the same as the "segment ids", which
# differentiates sentence 1 and 2 in 2-sentence tasks.
result = model(b_input_ids,
token_type_ids=None,
attention_mask=b_input_mask,
labels=b_labels,
return_dict=True)
# Get the loss and "logits" output by the model. The "logits" are the
# output values prior to applying an activation function like the
# softmax.
loss = result.loss
logits = result.logits
# Accumulate the validation loss.
total_eval_loss += loss.item()
# Move logits and labels to CPU
logits = logits.detach().cpu().numpy()
label_ids = b_labels.to('cpu').numpy()
# Calculate the accuracy for this batch of test sentences, and
# accumulate it over all batches.
total_eval_accuracy += flat_accuracy(logits, label_ids)
# Report the final accuracy for this validation run.
avg_val_accuracy = total_eval_accuracy / len(validation_dataloader)
print(" Accuracy: {0:.2f}".format(avg_val_accuracy))
# Calculate the average loss over all of the batches.
avg_val_loss = total_eval_loss / len(validation_dataloader)
# Measure how long the validation run took.
validation_time = format_time(time.time() - t0)
print(" Validation Loss: {0:.2f}".format(avg_val_loss))
print(" Validation took: {:}".format(validation_time))
# Record all statistics from this epoch.
training_stats.append(
{
'epoch': epoch_i + 1,
'Training Loss': avg_train_loss,
'Valid. Loss': avg_val_loss,
'Valid. Accur.': avg_val_accuracy,
'Training Time': training_time,
'Validation Time': validation_time
}
)
print("")
print("Training complete!")
print("Total training took {:} (h:mm:ss)".format(format_time(time.time()-total_t0)))

I'm not well-versed in this but I guess this would help. In the code you have posted, you haven't changed the num_labels=12, it is only 2. if you have 12 classes, then maybe you need to change it right? Let me know if it works. Also, could you share the answer to the previously posted question in calculating average word embedding Glove? I also want to learn how to implement it.

Related

Results from Pytorch tutorial using Google collab not matching results in PyCharm

I'm following a tutorial on Youtube for Pytorch which uses torch.manual_seed to ensure results are the same or least in the same ballpark.
Now admittedly I'm no expert but on running the code in chapter 2 of the tutorial the resulting graph from my model seems way off from what it should be.
I've tried going through the code line by line but for the last 3 days I can't seem to find any differences between my code and the code used in the tutorial (other than variable names which I changed for clarity on my part and so I'm not just mindlessly copying).
I work a pretty busy menial job with variable work days so I don't get a lot of time off but I've spent 10 'off days' across a month trying to solve this and I just can't see it. Genuinely any help would be appreciated even if it's an eror on my part I would be alright with that being stated without saying what it is; I just want to know if I've done anything wrong at all.
Here's a link to the doc file for the tutorial if that helps:
https://www.learnpytorch.io/02_pytorch_classification/#31-going-from-raw-model-outputs-to-predicted-labels-logits-prediction-probabilities-prediction-labels
Here's my code:
`from sklearn.datasets import make_circles
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split as tts
import torch
from torch import nn
from helper_functions import plot_predictions, plot_decision_boundary
import os
from pathlib import Path
# Generate 1000 samples
sample_number = 1000
# Create circles
Features, labels = make_circles(sample_number,
noise=0.03, # <- adds a little noise to the dots
random_state=42) # <- sets random state seed for consistency
# View the first 5 values for both parameters
print(f'First 5 Features (X):\n{Features[:5]}\n'
f'First 5 labels (y):\n{labels[:5]}')
# Make a DataFrame of circle data
circles = pd.DataFrame({"inputType1": Features[:, 0], # <- everything in the 0th index is type 1
"inputType2": Features[:, 1], # <- everything in the 1st index is type 2
"output": labels
})
print(f'Created dataframe:\n'
f'{circles.head(10)}')
# Check the different labels
print(f'Number of value per class:\n'
f'{circles.output.value_counts()}')
# Visualise the dataframe
plt.scatter(x=Features[:, 0],
y=Features[:, 1],
c=labels,
cmap=plt.cm.RdYlBu)
# Display plot
# plt.show()
# Check the shapes of the features and labels
# ML deals with numerical representation
# Ensuring the input and output shapes are compatible is crucial
print(f'Circle shapes: {Features.shape, labels.shape}')
# View the first example of features and labels
Features_samples = Features[0]
labels_samples = labels[0]
print(f'Values for one sample of X: {Features_samples} and the same for y: {labels_samples}')
print(f'Shapes for one sample of X: {Features_samples.shape}'
f'\nand the same for y: {labels_samples.shape}')
# ^ Features dataset has 1000 samples with two feature classes
# ^ labels dataset has 1000 samples with no feature classes since it's a scalar
# Turning datasets into tensors
Features = torch.from_numpy(Features).type(torch.float)
labels = torch.from_numpy(labels).type(torch.float)
# View the first five samples
print(f'First 5 Features:\n'
f'{Features[:5]}\n'
f'First 5 labels:\n'
f'{labels[:5]}\n')
# Split data into train and test sets
input_data_train, input_data_test, model_output_train, model_output_test = tts(Features,
labels,
test_size=0.2,
random_state=42)
# Check that splits follow this pattern:
# 80% train, 20% test
print(f'Number of samples for input train:\n'
f'{len(input_data_train)}\n'
f'Number of samples for input test:\n'
f'{len(input_data_test)}\n'
f'Number of samples for output train:\n'
f'{len(model_output_train)}\n'
f'Number of samples for output test:\n'
f'{len(model_output_test)}\n')
# Begin building learning model
# Make device-agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f'Learning model processing on: {device}\n')
"""# Assign parameters to the device
input_data_train = input_data_train.to(device)
input_data_test = input_data_test.to(device)
model_output_train = model_output_train.to(device)
model_output_test = model_output_test.to(device)"""
# 1. Construct a model class that subclasses nn.Module
class CircleClassificationModel(nn.Module):
def __init__(self):
super().__init__()
# 2. Create 2 nn.Linear layers for handling Feature and labels, input and output shapes
self.layer_1 = nn.Linear(in_features=2, out_features=5) # <- inputs 2 Features, outputs 5
self.layer_2 = nn.Linear(in_features=5, out_features=1) # <- inputs 5 Features, outputs 1 label
# 3. Define a forward method containing the forward pass computation
def forward(self, x):
# Return the output of layer_2, a single feature, which is the same shape as the label
return self.layer_2(self.layer_1(x))
# Computation goes through layer_1 first
# The output of layer_1 goes through layer_2
# 4. Create an instance of the model and send it to target device
classification_model_0 = CircleClassificationModel().to(device)
# Display model parameters
print(f'Model parameters for self defined model:\n'
f'{classification_model_0}\n')
# The above code can be written more succinctly using nn.Sequential
# Implements two layers of nn.Linear()
# Which calls the following equation
# y = ( x * (Weights).transposed ) + bias
classification_model_0 = nn.Sequential(
nn.Linear(in_features=2, out_features=5),
nn.Linear(in_features=5, out_features=1)
).to(device)
# Display model parameters
print(f'Model (nn.Sequential) parameters:\n'
f'{classification_model_0}\n\n')
# Make predictions with the model
untrained_predictions = classification_model_0(input_data_test.to(device))
print(f'Length of predictions: {len(untrained_predictions)}, Shape: {untrained_predictions.shape}')
print(f'Length of test samples: {len(model_output_test)}, Shape: {model_output_test.shape}')
print(f'\nFirst 10 predictions:\n'
f'{untrained_predictions[:10]}')
print(f'\nFirst 10 test labels:\n'
f'{model_output_test[:10]}')
# Create a loss function
# Unlike the regression model, the classification model uses a different loss type
# Binary Cross Entropy will be used for this task
# torch.nn.BCELoss() - measures the BCE between the target(labels) and the input(features)
# Another version may be used:
# torch.nn.BCEWithLogitsLoss() - same, except it has a built-in Sigmoid function
# loss_fn = nn.BCELoss() # <- BCELoss = no sigmoid built-in
loss_function = nn.BCEWithLogitsLoss() # <- BCEWithLogitsLoss = sigmoid built-in
# Create an optimiser
optimiser = torch.optim.SGD(params=classification_model_0.parameters(),
lr=0.1)
# Calculate accuracy (a classification metric)
# This acts as an evaluation metric
# Offers perspective into how the model is going
# The loss function measures how wrong the model but
# Evaluation metrics measure how right it is
# Accuracy will be the first metric to be utilised
# Accuracy can be measured by dividing the total number of correct predictions
# By the total number of overall predictions
def accuracy_function(label_actual, label_predicted):
# Calculates where 2 tensors are equal
correct = torch.eq(label_actual, label_predicted).sum().item()
accuracy = (correct / len(label_predicted)) * 100
return accuracy
# View the first 5 results of the forward pass on test data
# labels_logits represents the output of the forward pass method above
# Which utilises two nn.Linear() layers
labels_logits = classification_model_0(input_data_test.to(device))[:5]
print(f'First 5 outputs of the forward pass:\n'
f'{labels_logits}')
# Use the sigmoid function on the model labels_logits
# Turns the output of the forward pass into prediction probabilities
# Measures the odds the model classifies a data point into one class or the other
# In the case of this problem the classes are either 0 or 1
# It uses the logic:
# If labels_prediction_probabilities >= 0.5 then assign the label class (1)
# If labels_prediction_probabilities < 0.5 then assign the label class (0)
labels_prediction_probabilities = torch.sigmoid(labels_logits)
print(f'Output of the sigmoid-ed forward pass:\n'
f'{labels_prediction_probabilities}')
# Find the predicted labels (round the prediction probabilities as well)
label_predictions = torch.round(labels_prediction_probabilities)
# In full
labels_predictions_classes = \
torch.round(torch.sigmoid(classification_model_0(input_data_test.to(device))[:5]))
# Check for equality
print(torch.eq(label_predictions.squeeze(), labels_predictions_classes.squeeze()))
# Get rid of the extra dimensions
label_predictions.squeeze()
# Display model predictions
print(f'Model Predictions:\n'
f'{label_predictions}')
# Display test labels for comparison with model predictions
print(f'\nFirst five test data:\n'
f'{model_output_test[:5]}')
# Building the training loop
torch.manual_seed(42)
# Set the number of epochs
epochs = 100
# Process data on the target devices
input_data_train, model_output_train = input_data_train.to(device),\
model_output_train.to(device)
input_data_test, model_output_test = input_data_test.to(device),\
model_output_test.to(device)
# Build the training and evaluation loop
for epoch in range(epochs):
# Training
classification_model_0.train()
# todo: Do the Forward pass
# Model outputs raw labels_logits
train_labels_logits = classification_model_0(input_data_train).squeeze()
# ^ .squeeze() removes the extra dimensions, won't work if model and data on diff devices
train_label_prediction = torch.round(torch.sigmoid(train_labels_logits))
# ^ turns logits -> prediction probabilities -> prediction label classes
# todo: Calculate the loss
# 2. Calculates loss/accuracy
""" train_loss = loss_function(torch.sigmoid(train_labels_logits),
model_output_train) # <- nn.BCELoss needs torch.sigmoid() """
train_loss = loss_function(train_labels_logits,
model_output_train)
train_accuracy = accuracy_function(label_actual=model_output_train,
label_predicted=train_label_prediction)
# todo: Optimiser zero grad
optimiser.zero_grad()
# todo: Loss backwards
train_loss.backward()
# todo: optimiser step step step
optimiser.step()
# Testing
# todo: evaluate the model
classification_model_0.eval()
with torch.inference_mode():
# todo: Do the forward pass
test_logits = classification_model_0(input_data_test).squeeze()
test_predictions = torch.round(torch.sigmoid(test_logits))
# todo: calculate the loss
test_loss = loss_function(test_logits,
model_output_test)
test_accuracy = accuracy_function(label_actual=model_output_test,
label_predicted=test_predictions)
# todo: print model statistics every 10 epochs
if epoch % 10 == 0:
print(f'Epoch: {epoch} | Loss: {train_loss:.5f}, | Train Accuracy: {train_accuracy:.2f}%'
f'Test Loss: {test_loss:.5f}, | Test accuracy: {test_accuracy:.2f}%')
# Plot decision boundary for training and test sets
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("My Train")
plot_decision_boundary(classification_model_0, input_data_train, model_output_train)
plt.subplot(1, 2, 2)
plt.title("My Test")
plot_decision_boundary(classification_model_0, input_data_test, model_output_test)
plt.show()`
AND HERE'S THE TUTORIAL CODE:
from sklearn.datasets import make_circles
# Make 1000 samples
n_samples = 1000
# Create circles
X, y = make_circles(n_samples,
noise=0.03, # a little bit of noise to the dots
random_state=42) # keep random state so we get the same values
print(f"First 5 X features:\n{X[:5]}")
print(f"\nFirst 5 y labels:\n{y[:5]}")
# Make DataFrame of circle data
import pandas as pd
circles = pd.DataFrame({"X1": X[:, 0],
"X2": X[:, 1],
"label": y
})
circles.head(10)
# Check different labels
circles.label.value_counts()
# Visualize with a plot
import matplotlib.pyplot as plt
plt.scatter(x=X[:, 0],
y=X[:, 1],
c=y,
cmap=plt.cm.RdYlBu);
# Check the shapes of our features and labels
X.shape, y.shape
# View the first example of features and labels
X_sample = X[0]
y_sample = y[0]
print(f"Values for one sample of X: {X_sample} and the same for y: {y_sample}")
print(f"Shapes for one sample of X: {X_sample.shape} and the same for y: {y_sample.shape}")
# Turn data into tensors
# Otherwise this causes issues with computations later on
import torch
X = torch.from_numpy(X).type(torch.float)
y = torch.from_numpy(y).type(torch.float)
# View the first five samples
X[:5], y[:5]
# Split data into train and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,
y,
test_size=0.2, # 20% test, 80% train
random_state=42) # make the random split reproducible
len(X_train), len(X_test), len(y_train), len(y_test)
# Standard PyTorch imports
import torch
from torch import nn
# Make device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device
# 1. Construct a model class that subclasses nn.Module
class CircleModelV0(nn.Module):
def __init__(self):
super().__init__()
# 2. Create 2 nn.Linear layers capable of handling X and y input and output shapes
self.layer_1 = nn.Linear(in_features=2, out_features=5) # takes in 2 features (X), produces 5 features
self.layer_2 = nn.Linear(in_features=5, out_features=1) # takes in 5 features, produces 1 feature (y)
# 3. Define a forward method containing the forward pass computation
def forward(self, x):
# Return the output of layer_2, a single feature, the same shape as y
return self.layer_2(
self.layer_1(x)) # computation goes through layer_1 first then the output of layer_1 goes through layer_2
# 4. Create an instance of the model and send it to target device
model_0 = CircleModelV0().to(device)
model_0
# Replicate CircleModelV0 with nn.Sequential
model_0 = nn.Sequential(
nn.Linear(in_features=2, out_features=5),
nn.Linear(in_features=5, out_features=1)
).to(device)
model_0
# Make predictions with the model
untrained_preds = model_0(X_test.to(device))
print(f"Length of predictions: {len(untrained_preds)}, Shape: {untrained_preds.shape}")
print(f"Length of test samples: {len(y_test)}, Shape: {y_test.shape}")
print(f"\nFirst 10 predictions:\n{untrained_preds[:10]}")
print(f"\nFirst 10 test labels:\n{y_test[:10]}")
# Create a loss function
# loss_fn = nn.BCELoss() # BCELoss = no sigmoid built-in
loss_fn = nn.BCEWithLogitsLoss() # BCEWithLogitsLoss = sigmoid built-in
# Create an optimizer
optimizer = torch.optim.SGD(params=model_0.parameters(),
lr=0.1)
# Calculate accuracy (a classification metric)
def accuracy_fn(y_true, y_pred):
correct = torch.eq(y_true, y_pred).sum().item() # torch.eq() calculates where two tensors are equal
acc = (correct / len(y_pred)) * 100
return acc
# View the frist 5 outputs of the forward pass on the test data
y_logits = model_0(X_test.to(device))[:5]
y_logits
# Use sigmoid on model logits
y_pred_probs = torch.sigmoid(y_logits)
y_pred_probs
# Find the predicted labels (round the prediction probabilities)
y_preds = torch.round(y_pred_probs)
# In full
y_pred_labels = torch.round(torch.sigmoid(model_0(X_test.to(device))[:5]))
# Check for equality
print(torch.eq(y_preds.squeeze(), y_pred_labels.squeeze()))
# Get rid of extra dimension
y_preds.squeeze()
y_test[:5]
torch.manual_seed(42)
# Set the number of epochs
epochs = 100
# Put data to target device
X_train, y_train = X_train.to(device), y_train.to(device)
X_test, y_test = X_test.to(device), y_test.to(device)
# Build training and evaluation loop
for epoch in range(epochs):
### Training
model_0.train()
# 1. Forward pass (model outputs raw logits)
y_logits = model_0(
X_train).squeeze() # squeeze to remove extra `1` dimensions, this won't work unless model and data are on same device
y_pred = torch.round(torch.sigmoid(y_logits)) # turn logits -> pred probs -> pred labls
# 2. Calculate loss/accuracy
# loss = loss_fn(torch.sigmoid(y_logits), # Using nn.BCELoss you need torch.sigmoid()
# y_train)
loss = loss_fn(y_logits, # Using nn.BCEWithLogitsLoss works with raw logits
y_train)
acc = accuracy_fn(y_true=y_train,
y_pred=y_pred)
# 3. Optimizer zero grad
optimizer.zero_grad()
# 4. Loss backwards
loss.backward()
# 5. Optimizer step
optimizer.step()
### Testing
model_0.eval()
with torch.inference_mode():
# 1. Forward pass
test_logits = model_0(X_test).squeeze()
test_pred = torch.round(torch.sigmoid(test_logits))
# 2. Caculate loss/accuracy
test_loss = loss_fn(test_logits,
y_test)
test_acc = accuracy_fn(y_true=y_test,
y_pred=test_pred)
# Print out what's happening every 10 epochs
if epoch % 10 == 0:
print(
f"Epoch: {epoch} | Loss: {loss:.5f}, Accuracy: {acc:.2f}% | Test loss: {test_loss:.5f}, Test acc: {test_acc:.2f}%")
from helper_functions import plot_predictions, plot_decision_boundary
# Plot decision boundaries for training and test sets
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("Tut Train")
plot_decision_boundary(model_0, X_train, y_train)
plt.subplot(1, 2, 2)
plt.title("Tut Test")
plot_decision_boundary(model_0, X_test, y_test)

Pytorch VGG16 only returning True after training

I am trying to modify the VGG16 model in pytorch to do a simple yes/no feature detection (to detect if 1 particular feature is in an image). To do this I modified the last layer of the VGG network to output 2 tensors instead of 1000, which I believe is about all that should be necessary to accomplish this. When I test the network with random weights/biases it is around 50% accurate as you would expect, and when I print the output layer the tensors vary pretty randomly between -1 and 1. However after a bit of training the output layer very quickly shifts to the second tensor being a positive number and the first tensor being in the negative, until doing a max() just returns 1 (True) every time and thinks it has detected the feature in every image.
What am I doing wrong here? I'm very new to pytorch and machine learning, so I'm not sure what the issue is.
Here's the simplest, reproducable example I can manage. I did not include my training/test loaders because they load images off my local disk, but hopefully this is enough code to figure out what is going wrong:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
BATCH_SIZE = 16
LEARNING_RATE = 0.001
MOMENTUM = 0.9
def train_loop(dataloader, model, loss_fn, optimizer):
model.train()
size = len(dataloader.dataset)
for batch, (X, y) in enumerate(dataloader):
prediction = model(X)
loss = loss_fn(prediction, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(f"Training...{batch * len(X) / size * 100:.01f}%")
def test_loop(dataloader, model):
model.eval()
size = len(dataloader.dataset)
correct = 0
with torch.no_grad():
for X, y in dataloader:
outputs = model(X)
predictions = outputs.data.max(1, keepdim=True)[1]
correct += predictions.eq(y.data.view_as(predictions)).sum().item()
print(f'Test set accuracy: {(correct / size) * 100:.01f}%')
network = torchvision.models.vgg16(pretrained=False)
network.classifier[6] = nn.Linear(4096, 2)
loss_function = nn.CrossEntropyLoss()
optimizer = optim.SGD(network.parameters(), lr=LEARNING_RATE, momentum=MOMENTUM)
Running print(outputs.data.max(1)) in the test_loop function gives the following before/after training (I lowered the batch size to 8 here):
# before
# print(outputs.data)
tensor([
[4.0089e-02, -1.2475e-04],
[3.2431e-02, -2.2334e-03 ],
[3.7739e-02, 5.7708e-03],
[3.7453e-02, 1.9297e-03],
[4.3812e-02, 5.1457e-05],
[3.8975e-02, 6.3827e-03],
[4.3934e-02, 6.7114e-03],
[3.6315e-02, 8.8174e-03]
])
# print(outputs.data.max(1))
torch.return_types.max(
values=tensor([0.0401, 0.0324, 0.0377, 0.0375, 0.0438, 0.0390, 0.0439, 0.0363]),
indices=tensor([0, 0, 0, 0, 0, 0, 0, 0])
)
# after
# print(outputs.data)
tensor([
[-0.4314, 0.4763],
[-0.4296, 0.4799],
[-0.3882, 0.4378],
[-0.4257, 0.4682],
[-0.4330, 0.4682],
[-0.3420, 0.3832],
[-0.4467, 0.5142],
[-0.3902, 0.4175]
])
# print(outputs.data.max(1))
torch.return_types.max(
values=tensor([0.4763, 0.4799, 0.4378, 0.4682, 0.4635, 0.3832, 0.5142, 0.4175]),
indices=tensor([1, 1, 1, 1, 1, 1, 1, 1])
)

How to use a different test batch size for RNN in PyTorch?

I want to train an RNN over 5 training points where each sequence also has a size of 5. At test time, I want to send in a single data point and compute the output.
The task is to predict the next character in a sequence of five characters (all encoded as 1-hot vectors). I have tried duplicating the test data point five times. However, I am sure that this is not the right way to solve this problem.
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
# Define the parameters
H = [ 1, 0, 0, 0 ]
E = [ 0, 1, 0, 0 ]
L = [ 0, 0, 1, 0 ]
O = [ 0, 0, 0, 1 ]
# Define the model
net = nn.RNN(input_size=4, hidden_size=4, batch_first=True)
# Generate data
data = [[H,E,L,L,O],
[E,L,L,O,H],
[L,L,O,H,E],
[L,O,H,E,L],
[O,H,E,L,L]]
inputs = torch.tensor(data).float()
hidden = torch.randn(1,5,4) # Random initialization
correct_outputs = torch.tensor(np.array(data[1:]+[data[0]]).astype(float).tolist(), requires_grad=True)
# Set the loss function
criterion = torch.nn.MSELoss()
# Set the optimizer
optimizer = torch.optim.SGD(net.parameters(), lr=0.1)
# Perform gradient descent until convergence
for epoch in range(1000):
# Forward Propagation
outputs, hidden = net(inputs, hidden)
# Compute and print loss
loss = criterion(nn.functional.softmax(outputs,2), correct_outputs)
print('epoch: ', epoch,' loss: ', loss.item())
# Zero the gradients
optimizer.zero_grad()
# Backpropagation
loss.backward(retain_graph=True)
# Parameter update
optimizer.step()
# Predict
net(torch.tensor([[H,E,L,L,O]]).float(),hidden)
I get the following error:
RuntimeError: Expected hidden size (1, 1, 4), got (1, 5, 4)
I understand that torch wants a tensor of size (1,1,4) but I am not sure how I can convert the initial hidden state from (1, 5, 4) to (1, 1, 4). Any help would be highly appreciated!
You are getting the error because you are using:
hidden = torch.randn(1,5,4) # Random initialization
Instead, you should use:
hidden = torch.randn(1,inputs.size(0),4) # Random initialization
to cope up with the batch size of the inputs. So, do the following:
# Predict
inputs = torch.tensor([[H,E,L,L,O]]).float()
hidden = torch.randn(1,inputs.size(0),4)
net(inputs, hidden)
Suggestion: improve your coding style by following some good examples in PyTorch.
Another option would be to just remove the keyword argument, batch_first=True when you define the model.
# Define the model
net = nn.RNN(input_size=4, hidden_size=4)

TF | How to predict from CNN after training is done

Trying to work with the framework provided in the course Stanford cs231n, given the code below.
I can see the accuracy getting better and the net is trained however after the training process and checking the results on the validation set, how would I go to input one image into the model and see its prediction?
I have searched around and couldn't find some built in predict function in tensorflow as there is in keras.
Initializing the net and its parameters
# clear old variables
tf.reset_default_graph()
# setup input (e.g. the data that changes every batch)
# The first dim is None, and gets sets automatically based on batch size fed in
X = tf.placeholder(tf.float32, [None, 30, 30, 1])
y = tf.placeholder(tf.int64, [None])
is_training = tf.placeholder(tf.bool)
def simple_model(X,y):
# define our weights (e.g. init_two_layer_convnet)
# setup variables
Wconv1 = tf.get_variable("Wconv1", shape=[7, 7, 1, 32]) # Filter of size 7x7 with depth of 3. No. of filters is 32
bconv1 = tf.get_variable("bconv1", shape=[32])
W1 = tf.get_variable("W1", shape=[4608, 360]) # 5408 is 13x13x32 where 13x13 is the output of 7x7 filter on 32x32 image with padding of 2.
b1 = tf.get_variable("b1", shape=[360])
# define our graph (e.g. two_layer_convnet)
a1 = tf.nn.conv2d(X, Wconv1, strides=[1,2,2,1], padding='VALID') + bconv1
h1 = tf.nn.relu(a1)
h1_flat = tf.reshape(h1,[-1,4608])
y_out = tf.matmul(h1_flat,W1) + b1
return y_out
y_out = simple_model(X,y)
# define our loss
total_loss = tf.losses.hinge_loss(tf.one_hot(y,360),logits=y_out)
mean_loss = tf.reduce_mean(total_loss)
# define our optimizer
optimizer = tf.train.AdamOptimizer(5e-4) # select optimizer and set learning rate
train_step = optimizer.minimize(mean_loss)
Function for evaluating the model whether for training or validation and plots the results:
def run_model(session, predict, loss_val, Xd, yd,
epochs=1, batch_size=64, print_every=100,
training=None, plot_losses=False):
# Have tensorflow compute accuracy
correct_prediction = tf.equal(tf.argmax(predict,1), y)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# shuffle indicies
train_indicies = np.arange(Xd.shape[0])
np.random.shuffle(train_indicies)
training_now = training is not None
# setting up variables we want to compute and optimize
# if we have a training function, add that to things we compute
variables = [mean_loss,correct_prediction,accuracy]
if training_now:
variables[-1] = training
# counter
iter_cnt = 0
for e in range(epochs):
# keep track of losses and accuracy
correct = 0
losses = []
# make sure we iterate over the dataset once
for i in range(int(math.ceil(Xd.shape[0]/batch_size))):
# generate indicies for the batch
start_idx = (i*batch_size)%Xd.shape[0]
idx = train_indicies[start_idx:start_idx+batch_size]
# create a feed dictionary for this batch
feed_dict = {X: Xd[idx,:],
y: yd[idx],
is_training: training_now }
# get batch size
actual_batch_size = yd[idx].shape[0]
# have tensorflow compute loss and correct predictions
# and (if given) perform a training step
loss, corr, _ = session.run(variables,feed_dict=feed_dict)
# aggregate performance stats
losses.append(loss*actual_batch_size)
correct += np.sum(corr)
# print every now and then
if training_now and (iter_cnt % print_every) == 0:
print("Iteration {0}: with minibatch training loss = {1:.3g} and accuracy of {2:.2g}"\
.format(iter_cnt,loss,np.sum(corr)/actual_batch_size))
iter_cnt += 1
total_correct = correct/Xd.shape[0]
total_loss = np.sum(losses)/Xd.shape[0]
print("Epoch {2}, Overall loss = {0:.3g} and accuracy of {1:.3g}"\
.format(total_loss,total_correct,e+1))
if plot_losses:
plt.plot(losses)
plt.grid(True)
plt.title('Epoch {} Loss'.format(e+1))
plt.xlabel('minibatch number')
plt.ylabel('minibatch loss')
plt.show()
return total_loss,total_correct
The functions calls that trains the model
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
print('Training')
run_model(sess,y_out,mean_loss,x_train,y_train,1,64,100,train_step,True)
print('Validation')
run_model(sess,y_out,mean_loss,x_val,y_val,1,64)
You do not need to go far, you simply pass your new (test) feature matrix X_test into your network and perform a forward pass - the output layer is the prediction. So the code is something like this
session.run(y_out, feed_dict={X: X_test})

How to get a prediction from Tensorflow

I took the dynamic RNN example from aymericdamian: https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/dynamic_rnn.py
and modified it a little to fit my data. The data is a list of 7500 data sets of 60 entries.
There are 5 labels as output data.
The code runs perfect and I get an accuracy of 75%.
Now I want to feed the model with a data set and get a predicted label back, but I get the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder_2' with dtype int32
The code is listed below and the two last lines is where I want to get the prediction back.
What am I doing wrong?
# ==========
# MODEL
# ==========
# Parameters
learning_rate = 0.01
training_iters = 1000000
batch_size = 128
display_step = 10
# Network Parameters
seq_max_len = 60 # Sequence max length
n_hidden = 64 # hidden layer num of features
n_classes = 5 # large rise, small rise, almost equal, small drop, large drop
trainset = ToySequenceData(n_samples=7500, max_seq_len=seq_max_len)
testset = copy.copy(trainset)
# take 50% of total data to use for training
trainpart = int(0.2 * trainset.data.__len__())
pred_data = testset.data[testset.data.__len__() - 2:testset.labels.__len__() - 1][:]
pred_label = testset.labels[testset.labels.__len__() - 1:][:]
trainset.data = trainset.data[:trainpart][:]
testset.data = testset.data[trainpart:testset.data.__len__() - 2][:]
trainset.labels = trainset.labels[:trainpart][:]
testset.labels = testset.labels[trainpart:testset.labels.__len__() - 2][:]
trainset.seqlen = trainset.seqlen[:trainpart][:]
testset.seqlen = testset.seqlen[trainpart:testset.seqlen.__len__() - 2]
# tf Graph input
x = tf.placeholder("float", [None, seq_max_len, 1])
y = tf.placeholder("float", [None, n_classes])
# A placeholder for indicating each sequence length
seqlen = tf.placeholder(tf.int32, [None])
# Define weights
weights = {
'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
'out': tf.Variable(tf.random_normal([n_classes]))
}
def dynamic_rnn(x, seqlen, weights, biases):
# Prepare data shape to match `rnn` function requirements
# Current data input shape: (batch_size, n_steps, n_input)
# Required shape: 'n_steps' tensors list of shape (batch_size, n_input)
# Permuting batch_size and n_steps
x = tf.transpose(x, [1, 0, 2])
# Reshaping to (n_steps*batch_size, n_input)
x = tf.reshape(x, [-1, 1])
# Split to get a list of 'n_steps' tensors of shape (batch_size, n_input)
x = tf.split(0, seq_max_len, x)
# Define a lstm cell with tensorflow
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden)
# Get lstm cell output, providing 'sequence_length' will perform dynamic
# calculation.
outputs, states = tf.nn.rnn(lstm_cell, x, dtype=tf.float32,
sequence_length=seqlen)
# When performing dynamic calculation, we must retrieve the last
# dynamically computed output, i.e., if a sequence length is 10, we need
# to retrieve the 10th output.
# However TensorFlow doesn't support advanced indexing yet, so we build
# a custom op that for each sample in batch size, get its length and
# get the corresponding relevant output.
# 'outputs' is a list of output at every timestep, we pack them in a Tensor
# and change back dimension to [batch_size, n_step, n_input]
outputs = tf.pack(outputs)
outputs = tf.transpose(outputs, [1, 0, 2])
# Hack to build the indexing and retrieve the right output.
batch_size = tf.shape(outputs)[0]
# Start indices for each sample
index = tf.range(0, batch_size) * seq_max_len + (seqlen - 1)
# Indexing
outputs = tf.gather(tf.reshape(outputs, [-1, n_hidden]), index)
# Linear activation, using outputs computed above
return tf.matmul(outputs, weights['out']) + biases['out']
pred = dynamic_rnn(x, seqlen, weights, biases)
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)
# Evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initializing the variables
init = tf.initialize_all_variables()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
step = 1
# Keep training until reach max iterations
while step * batch_size < training_iters:
batch_x, batch_y, batch_seqlen = trainset.next(batch_size)
# Run optimization op (backprop)
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y,
seqlen: batch_seqlen})
if step % display_step == 0:
# Calculate batch accuracy
acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y,
seqlen: batch_seqlen})
# Calculate batch loss
loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y,
seqlen: batch_seqlen})
print("Iter " + str(step*batch_size) + ", Minibatch Loss= " +
"{:.6f}".format(loss) + ", Training Accuracy= " +
"{:.5f}".format(acc))
step += 1
print("Optimization Finished!")
# Calculate accuracy
test_data = testset.data
test_label = testset.labels
test_seqlen = testset.seqlen
print("Testing Accuracy:",
sess.run(accuracy, feed_dict={x: test_data, y: test_label,
seqlen: test_seqlen}))
print(pred.eval(feed_dict={x: pred_data}))
print(pred_label)
In TensorFlow, when you do not provide a name to tf.placeholder, it assumes the default name "Placeholder". The next placeholder created is named "Placeholder_1" and the third one is called "Placeholder_2".
This is done to uniquely identify each placeholder. Now in your last line, you are trying to get the value of pred.eval(). Looking at your dynamic_rnn code, it seems like you need a value in the seq_len placeholder, which is the third placeholder defined (that's why "Placeholder_2". Simply add the following key-value to your feed_dict,
print(pred.eval(feed_dict={x: pred_data, seqlen: pred_seqlen}))
Of course, you will need to define pred_seqlen properly like you defined the other two seq_len variables.

Resources