PyTorch LSTM: Prediction does not change when looping over test data - pytorch

I am trying to predict a variable using 7 features in time steps of 4.
Data
# Shape X_train: torch.Size([24433, 4, 7]
# Shape Y_train: torch.Size([24433, 4, 1]
# Shape X_test: torch.Size([6109, 4, 7]
# Shape Y_test: torch.Size([6109, 4, 1]
train_dataset = TensorDataset(X_train, Y_train)
test_dataset = TensorDataset(X_test, Y_test)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)
Example of data:
print(train_dataset[0], test_dataset[0])
(tensor([[ 7909.0000, 8094.0000, 9119.0000, 8666.0000, 17599.0000, 13657.0000,
10158.0000],
[ 7909.0000, 8073.0000, 9119.0000, 8636.0000, 17609.0000, 13975.0000,
10109.0000],
[ 7939.5000, 8083.5000, 9166.5000, 8659.5000, 18124.5000, 13971.0000,
10142.0000],
[ 7951.0000, 8064.0000, 9201.0000, 8663.0000, 17985.0000, 13967.0000,
10076.0000]]), tensor([[41.],
[41.],
[41.],
[41.]]))
(tensor([[ 8411.0000, 8530.0000, 9439.0000, 9101.0000, 17368.0000, 14174.0000,
11111.0000],
[ 8460.0000, 8651.5000, 9579.5000, 9355.5000, 17402.0000, 14509.0000,
11474.5000],
[ 8436.0000, 8617.0000, 9579.0000, 9343.0000, 17318.0000, 14288.0000,
11404.0000],
[ 8519.0000, 8655.0000, 9580.0000, 9348.0000, 17566.0000, 14640.0000,
11404.0000]]), tensor([[59.],
[59.],
[59.],
[59.]]))
LSTM model
I have created an LSTM model in PyTorch:
class LSTMModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super().__init__()
self.lstm = nn.LSTM(input_size, hidden_size)
self.linear = nn.Linear(hidden_size, output_size)
def forward(self, x):
x, _ = self.lstm(x)
x = self.linear(x)
return x
model = LSTMModel(input_size=7, hidden_size=256, output_size=1)
loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
I chose for hidden_size=256 and this optimizer as it returned me the lowest loss. (But whatever I chose, the problem remains.)
Loop over training and test data
Then train and apply model (I make lists to check predictions):
pred_train = []
true_train = []
model.train()
# Loop over the training set
for X, Y in train_loader:
optimizer.zero_grad()
Y_pred = model(X)
pred_train.append(Y_pred)
true_train.append(Y)
loss = loss_fn(Y_pred, Y)
loss.backward()
optimizer.step()
model.eval()
pred_test = []
true_test = []
# Loop over the test set
for X, Y in test_loader:
Y_pred = model(X)
pred_test.append(Y_pred)
true_test.append(Y)
loss = loss_fn(Y_pred, Y)
Checking predictions
When I check the predictions:
print(true_train[0], pred_train[0]) # or i, goes for every iteration
print(true_test[0], pred_test[0])
I get (shortened):
# True train data (L) & predicted train data (R)
tensor([[[ 3.], tensor([[[ 0.1095],
[ 3.], [ 0.0221],
[ 3.], [ 0.0087],
[ 3.]], [-0.0308]],
[[100.], [[ 0.0922],
[ 0.], [ 0.0395],
[ 0.], [-0.0423],
[ 0.]], [-0.0592]],
[[ 57.], [[ 0.0228],
[ 57.], [-0.0332],
[ 57.], [ 0.0296],
[ 57.]], [ 0.0018]],
... ...
# True test data (L) & predicted test data (R)
tensor([[[ 59.], tensor([[[20.6179],
[ 59.], [20.6179],
[ 59.], [20.6179],
[ 59.]], [20.6179]],
[[ 70.], [[23.4562],
[ 70.], [23.4562],
[ 70.], [23.4562],
[ 70.]], [23.4562]],
[[ 0.], [[23.8913],
[ 0.], [23.8913],
[ 0.], [23.8913],
[ 0.]], [23.8913]],
... ...
[[23.9606],
[23.9606],
[23.9606],
[23.9606]],
Also interesting regarding the training predictions:
print(pred_train[0], pred_train[5], pred_train[10])
Returns:
tensor([[[ 0.1095],
[ 0.0221],
[ 0.0087],
[-0.0308]],
[[ 0.0922],
[ 0.0395],
[-0.0423],
[-0.0592]],
...
tensor([[[18.4983],
[18.4983],
[18.4983],
[18.4983]],
[[20.6157],
[21.0552],
[21.0552],
[21.0552]],
...
tensor([[[25.8706],
[25.8706],
[25.8706],
[25.8706]],
[[29.2633],
[29.2633],
[29.2633],
[29.2633]],
...
The further the iteration, the higher the predictions in the training loop become.
My question
As you can see, the predictions (output) made in the test loop remain (~) the same. Eventually, they become constant: 23.9606.
But why is the output the same for every iteration in the test loop, and why do the predictions become higher in the training loop? What am I doing wrong/what should I be doing to get correct output?

I somewhat solved this by normalizing my input data. I now obtain different predictions for every output. Whether they are good or not is something I have to figure out!
# Calculate the mean and standard deviation of each feature in the training set
X_mean = X_train.mean(dim=0)
X_std = X_train.std(dim=0)
# Standardize the training set
X_train = (X_train - X_mean) / X_std
# Standardize the test set using the mean and standard deviation of the training set
X_test = (X_test - X_mean) / X_std

Related

Doc-Classification (Pytorch, Bert), how to change the training/validation loop to work for multilabel case

I am trying to make BertForSequenceClassification.from_pretrained() work for multilabel. Since the code I found online is for binary label case.
I have document classification with 12 labels. Using Bert Language model as pytorch model.
what should I do to make it work for multilabel. I get this error, when I run it initially without changing the train/val loop
ValueError: Target size (torch.Size([32])) must be the same as input size (torch.Size([32, 12]))
I assume I have to change the input since the target is [32,12]. But how to do this?
Edit: Full output
======== Epoch 1 / 4 ========
Training...
torch.Size([32, 64])
tensor([[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
...,
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1]], device='cuda:0')
tensor([ 9., 9., 3., 8., 9., 10., 4., 3., 4., 4., 9., 0., 9., 9.,
11., 3., 9., 9., 3., 4., 4., 7., 8., 9., 10., 6., 4., 0.,
10., 3., 4., 1.], dtype=torch.float64)
ValueError Traceback (most recent call last)
<ipython-input-25-ac7a3b802ac2> in <module>
90 # Specifically, we'll get the loss (because we provided labels) and the
91 # "logits"--the model outputs prior to activation.
---> 92 result = model(b_input_ids,
93 token_type_ids=None,
94 attention_mask=b_input_mask, 4 frames
/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py in binary_cross_entropy_with_logits(input, target, weight, size_average, reduce, reduction, pos_weight)
3158 3159 if not (target.size() == input.size()):
-> 3160 raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size())) 3161 3162 return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
ValueError: Target size (torch.Size([32])) must be the same as input size (torch.Size([32, 12]))
the code:
from transformers import BertForSequenceClassification, AdamW, BertConfig
# Load BertForSequenceClassification, the pretrained BERT model with a single
# linear classification layer on top.
model = BertForSequenceClassification.from_pretrained(
"bert-base-uncased", # Use the 12-layer BERT model, with an uncased vocab.
num_labels = 2, # The number of output labels--2 for binary classification.
# You can increase this for multi-class tasks.
output_attentions = False, # Whether the model returns attentions weights.
output_hidden_states = False, # Whether the model returns all hidden-states.
)
# Tell pytorch to run this model on the GPU.
model.cuda()
optimizer = AdamW(model.parameters(),
lr = 2e-5, # args.learning_rate - default is 5e-5, our notebook had 2e-5
eps = 1e-8 # args.adam_epsilon - default is 1e-8.
)
from transformers import get_linear_schedule_with_warmup
total_steps = len(train_dataloader) * epochs
# Create the learning rate scheduler.
scheduler = get_linear_schedule_with_warmup(optimizer,
num_warmup_steps = 0, # Default value in run_glue.py
num_training_steps = total_steps)
import random
import numpy as np
# This training code is based on the `run_glue.py` script here:
# https://github.com/huggingface/transformer/blob/5bfcd0485ece086ebcbed2d008813037968a9e58/examples/run_glue.py#L128
# Set the seed value all over the place to make this reproducible.
seed_val = 42
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)
# We'll store a number of quantities such as training and validation loss,
# validation accuracy, and timings.
training_stats = []
# Measure the total training time for the whole run.
total_t0 = time.time()
# For each epoch...
for epoch_i in range(0, epochs):
# ========================================
# Training
# ========================================
# Perform one full pass over the training set.
print("")
print('======== Epoch {:} / {:} ========'.format(epoch_i + 1, epochs))
print('Training...')
# Measure how long the training epoch takes.
t0 = time.time()
# Reset the total loss for this epoch.
total_train_loss = 0
# Put the model into training mode. Don't be mislead--the call to
# `train` just changes the *mode*, it doesn't *perform* the training.
# `dropout` and `batchnorm` layers behave differently during training
# vs. test (source: https://stackoverflow.com/questions/51433378/what-does-model-train-do-in-pytorch)
model.train()
# For each batch of training data...
for step, batch in enumerate(train_dataloader):
# Progress update every 40 batches.
if step % 40 == 0 and not step == 0:
# Calculate elapsed time in minutes.
elapsed = format_time(time.time() - t0)
# Report progress.
print(' Batch {:>5,} of {:>5,}. Elapsed: {:}.'.format(step, len(train_dataloader), elapsed))
# Unpack this training batch from our dataloader.
#
# As we unpack the batch, we'll also copy each tensor to the GPU using the
# `to` method.
#
# `batch` contains three pytorch tensors:
# [0]: input ids
# [1]: attention masks
# [2]: labels
b_input_ids = batch[0].to(device)
b_input_mask = batch[1].to(device)
b_labels = batch[2].to(device)
# Always clear any previously calculated gradients before performing a
# backward pass. PyTorch doesn't do this automatically because
# accumulating the gradients is "convenient while training RNNs".
# (source: https://stackoverflow.com/questions/48001598/why-do-we-need-to-call-zero-grad-in-pytorch)
model.zero_grad()
# Perform a forward pass (evaluate the model on this training batch).
# In PyTorch, calling `model` will in turn call the model's `forward`
# function and pass down the arguments. The `forward` function is
# documented here:
# https://huggingface.co/transformers/model_doc/bert.html#bertforsequenceclassification
# The results are returned in a results object, documented here:
# https://huggingface.co/transformers/main_classes/output.html#transformers.modeling_outputs.SequenceClassifierOutput
# Specifically, we'll get the loss (because we provided labels) and the
# "logits"--the model outputs prior to activation.
result = model(b_input_ids,
token_type_ids=None,
attention_mask=b_input_mask,
labels=b_labels,
return_dict=True)
loss = result.loss
logits = result.logits
# Accumulate the training loss over all of the batches so that we can
# calculate the average loss at the end. `loss` is a Tensor containing a
# single value; the `.item()` function just returns the Python value
# from the tensor.
total_train_loss += loss.item()
# Perform a backward pass to calculate the gradients.
loss.backward()
# Clip the norm of the gradients to 1.0.
# This is to help prevent the "exploding gradients" problem.
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# Update parameters and take a step using the computed gradient.
# The optimizer dictates the "update rule"--how the parameters are
# modified based on their gradients, the learning rate, etc.
optimizer.step()
# Update the learning rate.
scheduler.step()
# Calculate the average loss over all of the batches.
avg_train_loss = total_train_loss / len(train_dataloader)
# Measure how long this epoch took.
training_time = format_time(time.time() - t0)
print("")
print(" Average training loss: {0:.2f}".format(avg_train_loss))
print(" Training epcoh took: {:}".format(training_time))
# ========================================
# Validation
# ========================================
# After the completion of each training epoch, measure our performance on
# our validation set.
print("")
print("Running Validation...")
t0 = time.time()
# Put the model in evaluation mode--the dropout layers behave differently
# during evaluation.
model.eval()
# Tracking variables
total_eval_accuracy = 0
total_eval_loss = 0
nb_eval_steps = 0
# Evaluate data for one epoch
for batch in validation_dataloader:
# Unpack this training batch from our dataloader.
#
# As we unpack the batch, we'll also copy each tensor to the GPU using
# the `to` method.
#
# `batch` contains three pytorch tensors:
# [0]: input ids
# [1]: attention masks
# [2]: labels
b_input_ids = batch[0].to(device)
b_input_mask = batch[1].to(device)
b_labels = batch[2].to(device)
# Tell pytorch not to bother with constructing the compute graph during
# the forward pass, since this is only needed for backprop (training).
with torch.no_grad():
# Forward pass, calculate logit predictions.
# token_type_ids is the same as the "segment ids", which
# differentiates sentence 1 and 2 in 2-sentence tasks.
result = model(b_input_ids,
token_type_ids=None,
attention_mask=b_input_mask,
labels=b_labels,
return_dict=True)
# Get the loss and "logits" output by the model. The "logits" are the
# output values prior to applying an activation function like the
# softmax.
loss = result.loss
logits = result.logits
# Accumulate the validation loss.
total_eval_loss += loss.item()
# Move logits and labels to CPU
logits = logits.detach().cpu().numpy()
label_ids = b_labels.to('cpu').numpy()
# Calculate the accuracy for this batch of test sentences, and
# accumulate it over all batches.
total_eval_accuracy += flat_accuracy(logits, label_ids)
# Report the final accuracy for this validation run.
avg_val_accuracy = total_eval_accuracy / len(validation_dataloader)
print(" Accuracy: {0:.2f}".format(avg_val_accuracy))
# Calculate the average loss over all of the batches.
avg_val_loss = total_eval_loss / len(validation_dataloader)
# Measure how long the validation run took.
validation_time = format_time(time.time() - t0)
print(" Validation Loss: {0:.2f}".format(avg_val_loss))
print(" Validation took: {:}".format(validation_time))
# Record all statistics from this epoch.
training_stats.append(
{
'epoch': epoch_i + 1,
'Training Loss': avg_train_loss,
'Valid. Loss': avg_val_loss,
'Valid. Accur.': avg_val_accuracy,
'Training Time': training_time,
'Validation Time': validation_time
}
)
print("")
print("Training complete!")
print("Total training took {:} (h:mm:ss)".format(format_time(time.time()-total_t0)))
I'm not well-versed in this but I guess this would help. In the code you have posted, you haven't changed the num_labels=12, it is only 2. if you have 12 classes, then maybe you need to change it right? Let me know if it works. Also, could you share the answer to the previously posted question in calculating average word embedding Glove? I also want to learn how to implement it.

PyTorch: LSTM predicts the same constant value

I want to predict one variable using 7 features with time steps of 4:
# Shape X_train: torch.Size([24433, 4, 7]
# Shape Y_train: torch.Size([24433, 4, 1]
# Shape X_test: torch.Size([6109, 4, 7]
# Shape Y_test: torch.Size([6109, 4, 1]
train_dataset = TensorDataset(X_train, Y_train)
test_dataset = TensorDataset(X_test, Y_test)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)
My (initial) LSTM model:
class LSTMModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super().__init__()
self.lstm = nn.LSTM(input_size, hidden_size)
self.linear = nn.Linear(hidden_size, output_size)
def forward(self, x):
x, _ = self.lstm(x)
x = self.linear(x)
return x
model = LSTMModel(input_size=7, hidden_size=256, output_size=1)
loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
Apply model:
# Loop over the training set
for X, Y in train_loader:
optimizer.zero_grad()
Y_pred = model(X)
loss = loss_fn(Y_pred, Y)
loss.backward()
optimizer.step()
model.eval()
# Loop over the test set
for X, Y in test_loader:
Y_pred = model(X)
loss = loss_fn(Y_pred, Y)
An example of Y (true data):
tensor([[[59.],
[59.],
[59.],
[59.]],
[[70.],
[70.],
[70.],
[70.]],
[[ 100.],
[ 0.],
[ 0.],
[ 0.]],
# etc.
However, my Y_pred is somewhat like this:
tensor([[[15.8224],
[15.8224],
[15.8224],
[15.8224]],
[[16.1654],
[16.1654],
[16.1654],
[16.1654]],
[[16.2127],
[16.2127],
[16.2127],
[16.2127]],
# etc.
I have tried numerous different things:
Changing the model architecture (different batch size, different number of layers)
Adding dropout and decay parameters
Using epochs and changing the number of epochs when looping over training and test data
Different optimizers (Adam, SGD) with different learning rates
Log transforming my input data
Examples of my data in a previous question.
I am fairly new with PyTorch and LSTMs so I might do it wrong, but, whatever I change, I keep getting a (near) constant value from the predictions. What am I doing wrong/what should I be doing?
I solved this by normalizing my input data. I now obtain different predictions for every output:
# Calculate the mean and standard deviation of each feature in the training set
X_mean = X_train.mean(dim=0)
X_std = X_train.std(dim=0)
# Standardize the training set
X_train = (X_train - X_mean) / X_std
# Standardize the test set using the mean and standard deviation of the training set
X_test = (X_test - X_mean) / X_std

LSTM: calculating MSELoss in for loop returns NAN when backward pass

I am new with LSTM and ran into a problem. I'm trying to predict a variable using 7 features in time steps of 4. I am working with PyTorch.
Data
From my initial data frame (traindf), I created tensors for every feature and the target (Y) by:
featureX_train = torch.tensor(traindf.featureX[:test].values).view(-1, 4, 1)
Y_train = torch.tensor(traindf.Y[:test].values).view(-1, 4, 1)
...
featureX_test = torch.tensor(traindf.featureX[test:].values).view(-1, 4, 1)
Y_test = torch.tensor(traindf.Y[test:].values).view(-1, 4, 1)
I concatenated all the feature tensors into one X_train and one X_test. All tensors are float32:
print(X_train.shape, Y_train.shape)
print(X_test.shape, Y_test.shape)
torch.Size([24436, 4, 7]) torch.Size([24436, 4, 1])
torch.Size([6109, 4, 7]) torch.Size([6109, 4, 1])
Eventually, I have a train and test data set:
train_dataset = TensorDataset(X_train, Y_train)
test_dataset = TensorDataset(X_test, Y_test)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)
Preview of my data:
print(train_dataset[0])
print(test_dataset[0])
(tensor([[ 7909.0000, 8094.0000, 9119.0000, 8666.0000, 17599.0000, 13657.0000,
10158.0000],
[ 7909.0000, 8073.0000, 9119.0000, 8636.0000, 17609.0000, 13975.0000,
10109.0000],
[ 7939.5000, 8083.5000, 9166.5000, 8659.5000, 18124.5000, 13971.0000,
10142.0000],
[ 7951.0000, 8064.0000, 9201.0000, 8663.0000, 17985.0000, 13967.0000,
10076.0000]]), tensor([[41.],
[41.],
[41.],
[41.]]))
(tensor([[ 8411.0000, 8530.0000, 9439.0000, 9101.0000, 17368.0000, 14174.0000,
11111.0000],
[ 8460.0000, 8651.5000, 9579.5000, 9355.5000, 17402.0000, 14509.0000,
11474.5000],
[ 8436.0000, 8617.0000, 9579.0000, 9343.0000, 17318.0000, 14288.0000,
11404.0000],
[ 8519.0000, 8655.0000, 9580.0000, 9348.0000, 17566.0000, 14640.0000,
11404.0000]]), tensor([[59.],
[59.],
[59.],
[59.]]))
Applying LSTM model
My LSTM model:
class LSTMModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super().__init__()
self.lstm = nn.LSTM(input_size, hidden_size)
self.linear = nn.Linear(hidden_size, output_size)
def forward(self, x):
x, _ = self.lstm(x)
# x = self.linear(x[:, -1, :])
x = self.linear(x)
return x
model = LSTMModel(input_size=7, hidden_size=32, output_size=1)
loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())
model.train()
When I try:
for X, Y in train_loader:
optimizer.zero_grad()
Y_pred = model(X)
loss = loss_fn(Y_pred, Y)
print(loss)
I get (correctly I assume) Loss: tensor(1318.9419, grad_fn=<MseLossBackward0>)
However, when I run:
for X, Y in train_loader:
optimizer.zero_grad()
Y_pred = model(X)
loss = loss_fn(Y_pred, Y)
# Now apply backward pass
loss.backward()
optimizer.step()
print(loss)
I get: tensor(nan, grad_fn=<MseLossBackward0>)
Tried normalizing
I have tried normalizing the data:
mean = X.mean()
std = X.std()
X_normalized = (X - mean) / std
Y_pred = model(X_normalized)
But it yields the same result. Why do I yield 'nan' after applying loss.backward() in such a loop? How can I fix this? Thanks in advance!
My X_train contained few nan values. By removing the matrices with nan values, I solved this issue:
mask = torch.isnan(X_train).any(dim=1).any(dim=1)
X_train = X_train[~mask]
# Do the same for Y_train as it needs to be the same size
Y_train = Y_train[~mask]
# Create the TensorDataset for the training set
train_dataset = TensorDataset(X_train, Y_train)

Unable to assign all tensors to the GPU

I am trying to predict the outcome of a football fixture (using backpropagation) across 3 classes: home team wins, draw or away team wins; they are encoded as 0, 1, and 2, respectively.
Features: home_team, away_team, home_score, away_score, home_adv, match_imp
Target: outcome_final
Training, validation and test tensors:
X_train: torch.Size([25365, 554])
y_train: torch.Size([25365])
X_test: torch.Size([5436, 554])
y_test: torch.Size([5436])
X_val: torch.Size([5436, 554])
y_val: torch.Size([5436])
Network architecture:
Net(
(fc1): Linear(in_features=555, out_features=100, bias=True)
(fc2): Linear(in_features=100, out_features=3, bias=True)
(dropout): Dropout(p=0.2, inplace=False)
)
Weights and biases are generated at first:
fc1.weight: torch.Size([100, 554])
fc1.bias: torch.Size([100])
fc2.weight: torch.Size([3, 100])
fc2.bias: torch.Size([3])
ReLU activation function is used for the hidden layer, and Softmax activation function is used for the output layer.
The following code returns the error below.
# Creating the class for the neural network
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(554, 100)
self.fc2 = nn.Linear(100, 3)
self.dropout = nn.Dropout(p = 0.2)
def forward(self, x):
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = F.softmax(self.fc2(x), dim = 1)
return x
# Initializing model
model = Net().to(device)
X_train.to(device)
y_train.to(device)
X_val.to(device)
y_val.to(device)
# Initializing weights and biases
model.fc1.weight.data.normal_(0, 0.01)
model.fc1.bias.data.normal_(0, 0.01)
model.fc2.weight.data.normal_(0, 0.01)
model.fc2.bias.data.normal_(0, 0.01)
# TRAIN the model
def train_model(model, X_train, y_train, X_val, y_val, epochs = 10, learning_rate = 0.003):
# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr = learning_rate)
# Losses and accuracies
train_losses = []
val_losses = []
train_accs = []
val_accs = []
# Training happens here
for epoch in range(epochs):
# Shuffling data
permutation = torch.randperm(X_train.size()[0]).to(device)
X_train = X_train[permutation]
y_train = y_train[permutation]
# Creating batches
batch_size = 5
n_batches = X_train.size()[0] // batch_size
for i in range(n_batches):
# Zeroing gradients
optimizer.zero_grad()
# Forward pass
output = model(X_train[i * batch_size : (i + 1) * batch_size])
loss = criterion(output, y_train[i * batch_size : (i + 1) * batch_size].long())
# Backward pass
loss.backward()
# Updating weights and biases
optimizer.step()
# Sending to CPU
model.to('cpu')
# Training loss and accuracy
train_loss = criterion(model(X_train), y_train.long())
train_losses.append(train_loss)
train_acc = accuracy_score(y_train, torch.argmax(model(X_train), dim = 1))
train_accs.append(train_acc)
print('Epoch: ', epoch + 1, 'Training Loss: ', train_loss, 'Training Accuracy: ', train_acc)
# Validation loss and accuracy
val_loss = criterion(model(X_val), y_val.long())
val_losses.append(val_loss)
val_acc = accuracy_score(y_val, torch.argmax(model(X_val), dim = 1))
val_accs.append(val_acc)
print('Epoch: ', epoch + 1, 'Validation Loss: ', val_loss, 'Validation Accuracy: ', val_acc)
# Sending back to GPU
model.to(device)
X_train.to(device)
y_train.to(device)
X_val.to(device)
y_val.to(device)
return train_losses, val_losses, train_accs, val_accs
# Let's train the model
model = Net().to(device)
train_losses, val_losses, train_accs, val_accs = train_model(model, X_train, y_train, X_val, y_val)
ERROR:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)
I have tried ensuring all training and validation sets are converted as tensors and sent to the GPU. Yet, I am still getting this error.
Am I missing something here? Thanks in advance.

Bi-LSTM with Keras : dimensions must be equal but are 7 and 300

I am creating for the first time a bilstm with keras but I am having difficulties. So that you understand, here are the steps I have done:
I created an embedding matrix with Glove for my x ;
def create_embeddings(fichier,dictionnaire,dictionnaire_tokens):
with open(fichier) as file:
line = file.readline()
max_words = max(dictionnaire_tokens.values())+1 #1032
max_size_dimensions = 300
emb_matrix = np.zeros((max_words,max_size_dimensions))
for item,count in dictionnaire_tokens.items():
try:
vecteur = dictionnaire[item]
except:
pass
if vecteur is not None:
emb_matrix[count]= vecteur
return emb_matrix
I did some one hot encoding with my y's;
def one_hot_encoding(file):
with open(file) as file:
line = file.readline()
liste = []
while line:
tag = line.split(" ")[1]
tag = [tag]
line = file.readline()
liste.append(tag)
one_hot = MultiLabelBinarizer()
array = one_hot.fit_transform(liste)
return array
I compiled my model with keras
from tensorflow.keras.layers import Bidirectional
model = Sequential()
embedding_layer = Embedding(input_dim=1031 + 1,
output_dim=300,
weights=[embedding_matrix],
trainable=False)
model.add(embedding_layer)
bilstm_layer = Bidirectional(LSTM(units=300, return_sequences=True))
model.add(bilstm_layer)
model.add(Dense(300, activation="relu"))
#crf_layer = CRF(units=len(self.tags), sparse_target=True)
#model.add(crf_layer)
model.compile(optimizer="adam", loss='binary_crossentropy', metrics='acc')
model.summary()
Input of my embedding layer (embedding matrix) :
[[ 0. 0. 0. ... 0. 0. 0. ]
[ 0. 0. 0. ... 0. 0. 0. ]
[ 0. 0. 0. ... 0. 0. 0. ]
...
[-0.068577 -0.71314 0.3898 ... -0.077923 -1.0469 0.56874 ]
[ 0.32461 0.50463 0.72544 ... 0.17634 -0.28961 0.29007 ]
[-0.33771 -0.24912 -0.032685 ... -0.033254 -0.45513 -0.13319 ]]
I train my model. However when I want to train it, I get the following message: ValueError: Dimensions must be equal, but are 7 and 300 for '{{node binary_crossentropy/mul}} = Mul[T=DT_FLOAT](binary_crossentropy/Cast, binary_crossentropy/Log)' with input shapes: [?,7], [?,300,300].
My embedding matrix was made with glove 300d so it has 300 dimensions. While my labels, I have only 7 labels. So I have to make my x and y have the same dimensions but how? Thank you!!!
keras.backend.clear_session()
from tensorflow.keras.layers import Bidirectional
model = Sequential()
_input = keras.layers.Input(shape=(300,1))
model.add(_input)
bilstm_layer = Bidirectional(LSTM(units=300, return_sequences=False))
model.add(bilstm_layer)
model.add(Dense(7, activation="relu")) #here 7 is the number of classes you have and None is the batch_size
#crf_layer = CRF(units=len(self.tags), sparse_target=True)
#model.add(crf_layer)
model.compile(optimizer="adam", loss='binary_crossentropy', metrics='acc')
model.summary()

Resources