Pytorch : Expected all tensors on same device - pytorch

I have my model and inputs moved on the same device but I still get the runtime error :
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)
Here is my code,
First my model implementation :
import torch
import torch.nn.functional as F
class Net(torch.nn.Module):
def __init__(self, n_hiddens, n_feature= 2, n_output= 1):
super().__init__()
self.hiddens = []
n_hidden_in = n_feature
for n_hidden in n_hiddens :
self.hiddens.append( torch.nn.Linear(n_hidden_in, n_hidden) ) # hidden layer
n_hidden_in = n_hidden
self.predict = torch.nn.Linear(n_hidden, n_output) # output layer
def forward(self, x):
for hidden in self.hiddens :
x = F.relu(hidden(x)) # activation function for hidden layer
x = self.predict(x) # linear output
return x
Then I define my dataloaders. Here, X and y are numpy arrays
from torch.utils.data import TensorDataset, DataLoader
# Split training/test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state= 42)
X_train_tensor = torch.from_numpy(X_train)
y_train_tensor = torch.from_numpy(y_train)
X_test_tensor = torch.from_numpy(X_test)
y_test_tensor = torch.from_numpy(y_test)
train_dataset = TensorDataset(X_train_tensor, y_train_tensor) # create your datset
train_dataloader = DataLoader(train_dataset, batch_size= 1000) # create your dataloader
test_dataset = TensorDataset(X_test_tensor, y_test_tensor) # create your datset
test_dataloader = DataLoader(test_dataset, batch_size= 1000) # create your dataloader
Here I train my model. The error occurs during the line "outputs = regressor(inputs)"
NUM_EPOCHS = 2000
BATCH_SIZE = 1000
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(f"Device used : {device}")
# 1 hidden layer
total_num_nodes = 256
regressor = Net(n_hiddens= [total_num_nodes]).to(device)
optimizer = torch.optim.SGD(regressor.parameters(), lr=0.2, momentum= 0.1, nesterov= True)
loss_func = torch.nn.MSELoss() # this is for regression mean squared loss
for epoch in range(NUM_EPOCHS):
running_loss = 0.0
for i, data in enumerate(train_dataloader, 0):
inputs, values = data
inputs = inputs.float().to(device)
values = values.float().to(device)
optimizer.zero_grad() # clear gradients for next train
print(f"Input device is : cuda:{inputs.get_device()}")
print(f"Target value device is : cuda:{values.get_device()}")
print(f"Is model on cuda ? : {next(regressor.parameters()).is_cuda}")
outputs = regressor(inputs) # <-- This is where I have the error
loss = loss_func(outputs, values)
loss.backward() # backpropagation, compute gradients
optimizer.step() # apply gradients
Here are the outputs of my print statements :
Device used : cuda:0
Input device is : cuda:0
Target value device is : cuda:0
Is model on cuda ? :True
This should mean that my model and my tensors are all on the same device so why do I still have this error ?
The error log is :
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-6-5234b830bebc> in <module>()
24 print(f"Target value device is : cuda:{values.get_device()}")
25 print(f"Is model on cuda ? : {next(regressor.parameters()).is_cuda}")
---> 26 outputs = regressor(inputs)
27 loss = loss_func(outputs, values)
28 loss.backward() # backpropagation, compute gradients
4 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []
<ipython-input-4-56c54b30b771> in forward(self, x)
16 def forward(self, x):
17 for hidden in self.hiddens :
---> 18 x = F.relu(hidden(x)) # activation function for hidden layer
19 x = self.predict(x) # linear output
20 return x
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py in forward(self, input)
101
102 def forward(self, input: Tensor) -> Tensor:
--> 103 return F.linear(input, self.weight, self.bias)
104
105 def extra_repr(self) -> str:
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
1846 if has_torch_function_variadic(input, weight, bias):
1847 return handle_torch_function(linear, (input, weight, bias), input, weight, bias=bias)
-> 1848 return torch._C._nn.linear(input, weight, bias)
1849
1850
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)
Thank you very much

TL;DR use nn.ModuleList instead of a pythonic one to store the hidden layers in Net.
All your hidden layers are stored in a simple pythonic list self.hidden in Net. When you move your model to GPU, using .to(device), pytorch has no way to tell that all the elements of this pythonic list should also be moved to the same device.
however, if you make self.hidden = nn.ModuleLis(), pytorch now knows to treat all elements of this special list as nn.Modules and recursively move them to the same device as Net.
See these answers 1, 2, 3 for more details.

Related

pytorch error in multiplying matrices in neural network

I was trying to make a Neural Network in PyTorch, however I ran into the error below. I'm still new to this topic so I am not able to understand how I should go about solving this.
Code:
class ANN_Model(nn.Module):
def __init__(self,input_features=8,hidden1=8,hidden2=200,hidden3=200,hidden4=300,hidden5=300,hidden6=400,hidden7=400,hidden8=300,hidden9=300,out_features=2):
super().__init__()
self.f_connected1=nn.Linear(input_features,hidden1)
self.f_connected2=nn.Linear(hidden1,hidden2)
self.f_connected2=nn.Linear(hidden2,hidden3)
self.f_connected2=nn.Linear(hidden3,hidden4)
self.f_connected2=nn.Linear(hidden4,hidden5)
self.f_connected2=nn.Linear(hidden5,hidden6)
self.f_connected2=nn.Linear(hidden6,hidden7)
self.f_connected2=nn.Linear(hidden7,hidden8)
self.f_connected2=nn.Linear(hidden8,hidden9)
self.out=nn.Linear(hidden9,out_features)
def forward(self,x):
x=F.relu(self.f_connected1(x))
x=F.relu(self.f_connected2(x))
x=F.relu(self.f_connected3(x))
x=F.relu(self.f_connected4(x))
x=F.relu(self.f_connected5(x))
x=F.relu(self.f_connected6(x))
x=F.relu(self.f_connected7(x))
x=F.relu(self.f_connected8(x))
x=F.relu(self.f_connected9(x))
x=self.out(x)
return x
loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr = 0.01)
epochs = 500
final_losses = []
for i in range(epochs):
i = i + 1
y_pred = model.forward(X_train)
loss=loss_function(y_pred, y_train)
final_losses.append(loss.item())
if i%10==1:
print("Epoch number: {} and the loss: {}".format(i, loss.item()))
optimizer.zero_grad()
loss.backward()
optimizer.step()
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Input In [13], in <cell line: 3>()
3 for i in range(epochs):
4 i = i + 1
----> 5 y_pred = model.forward(X_train)
6 loss=loss_function(y_pred, y_train)
7 final_losses.append(loss.item())
Input In [8], in ANN_Model.forward(self, x)
14 def forward(self,x):
15 x=F.relu(self.f_connected1(x))
---> 16 x=F.relu(self.f_connected2(x))
17 x=F.relu(self.f_connected3(x))
18 x=F.relu(self.f_connected4(x))
File ~/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
1126 # If we don't have any hooks, we want to skip the rest of the logic in
1127 # this function, and just call forward.
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = [], []
File ~/miniconda3/lib/python3.9/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (691x8 and 300x300)
I was trying to make a Neural Network in PyTorch, however I ran into the error below. I'm still new to this topic so I am not able to understand how I should go about solving this.
I found it, in your model's constructor __init__ every layer is named self.f_connected2 and because of that it expects a shape of (batch_size,300).

RuntimeError: all elements of input should be between 0 and 1

I want to use an RNN with bilstm layers using pytorch on protein embeddings. It worked with Linear Layer but when i use Bilstm i have a Runtime error. Sorry if its not clear its my first publication and i will be grateful if someone can help me.
from collections import Counter, OrderedDict
from typing import Optional
import numpy as np
import pytorch_lightning as pl
import torch
import torch.nn.functional as F # noqa
from deepchain import log
from sklearn.model_selection import train_test_split
from sklearn.utils.class_weight import compute_class_weight
from torch import Tensor, nn
num_layers=2
hidden_size=256
from torch.utils.data import DataLoader, TensorDataset
def classification_dataloader_from_numpy(
x: np.ndarray, y: np.array, batch_size: int = 32
) -> DataLoader:
"""Build a dataloader from numpy for classification problem
This dataloader is use only for classification. It detects automatically the class of
the problem (binary or multiclass classification)
Args:
x (np.ndarray): [description]
y (np.array): [description]
batch_size (int, optional): [description]. Defaults to None.
Returns:
DataLoader: [description]
"""
n_class: int = len(np.unique(y))
if n_class > 2:
log.info("This is a classification problem with %s classes", n_class)
else:
log.info("This is a binary classification problem")
# y is float for binary classification, int for multiclass
y_tensor = torch.tensor(y).long() if len(np.unique(y)) > 2 else torch.tensor(y).float()
tensor_set = TensorDataset(torch.tensor(x).float(), y_tensor)
loader = DataLoader(tensor_set, batch_size=batch_size)
return loader
class RNN(pl.LightningModule):
"""A `pytorch` based deep learning model"""
def __init__(self, input_shape: int, n_class: int, num_layers, n_neurons: int = 128, lr: float = 1e-3):
super(RNN,self).__init__()
self.lr = lr
self.n_neurons=n_neurons
self.num_layers=num_layers
self.input_shape = input_shape
self.output_shape = 1 if n_class <= 2 else n_class
self.activation = nn.Sigmoid() if n_class <= 2 else nn.Softmax(dim=-1)
self.lstm = nn.LSTM(self.input_shape, self.n_neurons, num_layers, batch_first=True, bidirectional=True)
self.fc= nn.Linear(self.n_neurons, self.output_shape)
def forward(self, x):
h0=torch.zeros(self.num_layers, x_size(0), self.n_neurons).to(device)
c0=torch.zeros(self.num_layers, x_size(0), self.n_neurons).to(device)
out, _=self.lstm(x,(h0, c0))
out=self.fc(out[:, -1, :])
return self.fc(x)
def training_step(self, batch, batch_idx):
"""training_step defined the train loop. It is independent of forward"""
x, y = batch
y_hat = self.fc(x).squeeze()
y = y.squeeze()
if self.output_shape > 1:
y_hat = torch.log(y_hat)
loss = self.loss(y_hat, y)
self.log("train_loss", loss, on_epoch=True, on_step=False)
return {"loss": loss}
def validation_step(self, batch, batch_idx):
"""training_step defined the train loop. It is independent of forward"""
x, y = batch
y_hat = self.fc(x).squeeze()
y = y.squeeze()
if self.output_shape > 1:
y_hat = torch.log(y_hat)
loss = self.loss(y_hat, y)
self.log("val_loss", loss, on_epoch=True, on_step=False)
return {"val_loss": loss}
def configure_optimizers(self):
"""(Optional) Configure training optimizers."""
return torch.optim.Adam(self.parameters(),lr=self.lr)
def compute_class_weight(self, y: np.array, n_class: int):
"""Compute class weight for binary/multiple classification
If n_class=2, only compute weights for the positve class.
If n>2, compute for all classes.
Args:
y ([np.array]):vector of int represented the class
n_class (int) : number fo class to use
"""
if n_class == 2:
class_count: typing.Counter = Counter(y)
cond_binary = (0 in class_count) and (1 in class_count)
assert cond_binary, "Must have O and 1 class for binary classification"
weight = class_count[0] / class_count[1]
else:
weight = compute_class_weight(class_weight="balanced", classes=np.unique(y), y=y)
return torch.tensor(weight).float()
def fit(
self,
x: np.ndarray,
y: np.array,
epochs: int = 10,
batch_size: int = 32,
class_weight: Optional[str] = None,
validation_data: bool = True,
**kwargs
):
assert isinstance(x, np.ndarray), "X should be a numpy array"
assert isinstance(y, np.ndarray), "y should be a numpy array"
assert class_weight in (
None,
"balanced",
), "the only choice available for class_weight is 'balanced'"
n_class = len(np.unique(y))
weight = None
self.input_shape = x.shape[1]
self.output_shape = 1 if n_class <= 2 else n_class
self.activation = nn.Sigmoid() if n_class <= 2 else nn.Softmax(dim=-1)
if class_weight == "balanced":
weight = self.compute_class_weight(y, n_class)
self.loss = nn.NLLLoss(weight) if self.output_shape > 1 else nn.BCELoss(weight)
if validation_data:
x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.2)
train_loader = classification_dataloader_from_numpy(
x_train, y_train, batch_size=batch_size
)
val_loader = classification_dataloader_from_numpy(x_val, y_val, batch_size=batch_size)
else:
train_loader = classification_dataloader_from_numpy(x, y, batch_size=batch_size)
val_loader = None
self.trainer = pl.Trainer(max_epochs=epochs, **kwargs)
self.trainer.fit(self, train_loader, val_loader)
def predict(self, x):
"""Run inference on data."""
if self.output_shape is None:
log.warning("Model is not fitted. Can't do predict")
return
return self.forward(x).detach().numpy()
def save(self, path: str):
"""Save the state dict model with torch"""
torch.save(self.fc.state_dict(), path)
log.info("Save state_dict parameters in model.pt")
def load_state_dict(self, state_dict: "OrderedDict[str, Tensor]", strict: bool = False):
"""Load state_dict saved parameters
Args:
state_dict (OrderedDict[str, Tensor]): state_dict tensor
strict (bool, optional): [description]. Defaults to False.
"""
self.fc.load_state_dict(state_dict, strict=strict)
self.fc.eval()
mlp = RNN(input_shape=1024, n_neurons=1024, num_layers=2, n_class=2)
mlp.fit(embeddings_train, np.array(y_train),validation_data=(embeddings_test, np.array(y_test)), epochs=30)
mlp.save("model.pt")
These are the errors that are occured. I really need help and i remain at your disposal for further informations.
Error 1
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-154-e5fde11a675c> in <module>
1 # init MLP model, train it on the data, then save model
2 mlp = RNN(input_shape=1024, n_neurons=1024, num_layers=2, n_class=2)
----> 3 mlp.fit(embeddings_train, np.array(y_train),validation_data=(embeddings_test, np.array(y_test)), epochs=30)
4 mlp.save("model.pt")
<ipython-input-153-a8d51af53bb5> in fit(self, x, y, epochs, batch_size, class_weight, validation_data, **kwargs)
134 val_loader = None
135 self.trainer = pl.Trainer(max_epochs=epochs, **kwargs)
--> 136 self.trainer.fit(self, train_loader, val_loader)
137 def predict(self, x):
138 """Run inference on data."""
/opt/conda/envs/bio-transformers/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
456 )
457
--> 458 self._run(model)
459
460 assert self.state.stopped
/opt/conda/envs/bio-transformers/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in _run(self, model)
754
755 # dispatch `start_training` or `start_evaluating` or `start_predicting`
--> 756 self.dispatch()
757
758 # plugin will finalized fitting (e.g. ddp_spawn will load trained model)
/opt/conda/envs/bio-transformers/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in dispatch(self)
795 self.accelerator.start_predicting(self)
796 else:
--> 797 self.accelerator.start_training(self)
798
799 def run_stage(self):
/opt/conda/envs/bio-transformers/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py in start_training(self, trainer)
94
95 def start_training(self, trainer: 'pl.Trainer') -> None:
---> 96 self.training_type_plugin.start_training(trainer)
97
98 def start_evaluating(self, trainer: 'pl.Trainer') -> None:
/opt/conda/envs/bio-transformers/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in start_training(self, trainer)
142 def start_training(self, trainer: 'pl.Trainer') -> None:
143 # double dispatch to initiate the training loop
--> 144 self._results = trainer.run_stage()
145
146 def start_evaluating(self, trainer: 'pl.Trainer') -> None:
/opt/conda/envs/bio-transformers/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_stage(self)
805 if self.predicting:
806 return self.run_predict()
--> 807 return self.run_train()
808
809 def _pre_training_routine(self):
/opt/conda/envs/bio-transformers/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_train(self)
840 self.progress_bar_callback.disable()
841
--> 842 self.run_sanity_check(self.lightning_module)
843
844 self.checkpoint_connector.has_trained = False
/opt/conda/envs/bio-transformers/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_sanity_check(self, ref_model)
1105
1106 # run eval step
-> 1107 self.run_evaluation()
1108
1109 self.on_sanity_check_end()
/opt/conda/envs/bio-transformers/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_evaluation(self, on_epoch)
960 # lightning module methods
961 with self.profiler.profile("evaluation_step_and_end"):
--> 962 output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
963 output = self.evaluation_loop.evaluation_step_end(output)
964
/opt/conda/envs/bio-transformers/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py in evaluation_step(self, batch, batch_idx, dataloader_idx)
172 model_ref._current_fx_name = "validation_step"
173 with self.trainer.profiler.profile("validation_step"):
--> 174 output = self.trainer.accelerator.validation_step(args)
175
176 # capture any logged information
/opt/conda/envs/bio-transformers/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py in validation_step(self, args)
224
225 with self.precision_plugin.val_step_context(), self.training_type_plugin.val_step_context():
--> 226 return self.training_type_plugin.validation_step(*args)
227
228 def test_step(self, args: List[Union[Any, int]]) -> Optional[STEP_OUTPUT]:
/opt/conda/envs/bio-transformers/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in validation_step(self, *args, **kwargs)
159
160 def validation_step(self, *args, **kwargs):
--> 161 return self.lightning_module.validation_step(*args, **kwargs)
162
163 def test_step(self, *args, **kwargs):
<ipython-input-153-a8d51af53bb5> in validation_step(self, batch, batch_idx)
78 if self.output_shape > 1:
79 y_hat = torch.log(y_hat)
---> 80 loss = self.loss(y_hat, y)
81 self.log("val_loss", loss, on_epoch=True, on_step=False)
82 return {"val_loss": loss}
/opt/conda/envs/bio-transformers/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
/opt/conda/envs/bio-transformers/lib/python3.7/site-packages/torch/nn/modules/loss.py in forward(self, input, target)
611 def forward(self, input: Tensor, target: Tensor) -> Tensor:
612 assert self.weight is None or isinstance(self.weight, Tensor)
--> 613 return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
614
615
/opt/conda/envs/bio-transformers/lib/python3.7/site-packages/torch/nn/functional.py in binary_cross_entropy(input, target, weight, size_average, reduce, reduction)
2760 weight = weight.expand(new_size)
2761
-> 2762 return torch._C._nn.binary_cross_entropy(input, target, weight, reduction_enum)
2763
2764
RuntimeError: all elements of input should be between 0 and 1
Error 2
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-139-b7e8b13763ef> in <module>
1 # Model evaluation
----> 2 y_pred = mlp(embeddings_val).squeeze().detach().numpy()
3 model_evaluation_accuracy(np.array(y_val), y_pred)
/opt/conda/envs/bio-transformers/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
<ipython-input-136-e2fc535640ab> in forward(self, x)
55 self.fc= nn.Linear(self.hidden_size, self.output_shape)
56 def forward(self, x):
---> 57 h0=torch.zeros(self.num_layers, x_size(0), self.hidden_size).to(device)
58 c0=torch.zeros(self.num_layers, x_size(0), self.hidden_size).to(device)
59 out, _=self.lstm(x,(h0, c0))
NameError: name 'x_size' is not defined
I am adding this as an answer because it would be too hard to put in comment.
The main problem that you have is about BCE loss. IIRC BCE loss expects p(y=1), so your output should be between 0 and 1. If you want to use logits (which is also more numerically stable), you should use BinaryCrossEntropyWithLogits.
As you mention in one of the comments, you are using the sigmoid activation but something about your forward function looks off to me. Mainly the last line of your forward function is
return self.fc(x)
This does not use sigmoid activation. Moreover you are only using input, x for producing the output. The LSTM outputs are just being discarded? I think, it would be a good idea to add some prints statements or breakpoints to make sure that the intermediate outputs are as you expect them to be.
I got the error RuntimeError: all elements of input should be between 0 and 1 because my x data had NaN entries.
I just bumped into this myself. It looks like both you and I missed adding a sigmoid function at the end of the forward function. This update should fix your problem.
def forward(self, x):
h0=torch.zeros(self.num_layers, x_size(0), self.n_neurons).to(device)
c0=torch.zeros(self.num_layers, x_size(0), self.n_neurons).to(device)
out, _=self.lstm(x,(h0, c0))
out=self.fc(out[:, -1, :])
return torch.sigmoid(out)

Expected more than 1 value per channel when training, got input size torch.Size([1, **])

I met an error when I use BatchNorm1d, code:
##% first I set a model
class net(nn.Module):
def __init__(self, max_len, feature_linear, rnn, input_size, hidden_size, output_dim, num__rnn_layers, bidirectional, batch_first=True, p=0.2):
super(net, self).__init__()
self.max_len = max_len
self.feature_linear = feature_linear
self.input_size = input_size
self.hidden_size = hidden_size
self.bidirectional = bidirectional
self.num_directions = 2 if bidirectional == True else 1
self.p = p
self.batch_first = batch_first
self.linear1 = nn.Linear(max_len, feature_linear)
init.kaiming_normal_(self.linear1.weight, mode='fan_in')
self.BN1 = BN(feature_linear)
def forward(self, xb, seq_len_crt):
rnn_input = torch.zeros(xb.shape[0], self.feature_linear, self.input_size)
for i in range(self.input_size):
out = self.linear1(xb[:, :, i]) # xb[:,:,i].shape:(1,34), out.shape(1,100)
out = F.relu(out) # 输入:out.shape(1,100), 输出:out.shape(1,100)
out = self.BN1(out) # 输入:out.shape(1,100),输出:out.shape(1,100)
return y_hat.squeeze(-1)
##% make the model as a function and optimize it
input_size = 5
hidden_size = 32
output_dim = 1
num_rnn_layers = 2
bidirectional = True
rnn = nn.LSTM
batch_size = batch_size
feature_linear = 60
BN = nn.BatchNorm1d
model = net(max_len, feature_linear, rnn, input_size, hidden_size, output_dim, num_rnn_layers, bidirectional, p=0.1)
loss_func = nn.MSELoss(reduction='none')
# optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
# optimizer = optim.Adam(model.parameters(), lr=0.01)
optimizer = optim.AdamW(model.parameters(), lr=0.001, weight_decay=0.05)
##% use this model to predict data
def predict(xb, model, seq_len):
# xb's shape should be (batch_size, seq_len, n_features)
if xb.ndim == 2: # suitable for both ndarray and Tensor
# add a {batch_size} dim
xb = xb[None, ]
if not isinstance(xb, torch.Tensor):
xb = torch.Tensor(xb)
return model(xb, seq_len) # xb.shape(1,34,5)
##% create training/valid/test data
seq_len_train_iter = []
for i in range(0, len(seq_len_train), batch_size):
if i + batch_size <= len(seq_len_train):
seq_len_train_iter.append(seq_len_train[i:i+batch_size])
else:
seq_len_train_iter.append(seq_len_train[i:])
seq_len_valid_iter = []
for i in range(0, len(seq_len_valid), batch_size):
if i + batch_size <= len(seq_len_valid):
seq_len_valid_iter.append(seq_len_valid[i:i+batch_size])
else:
seq_len_valid_iter.append(seq_len_valid[i:])
seq_len_test_iter = []
for i in range(0, len(seq_len_test), batch_size):
if i + batch_size <= len(seq_len_test):
seq_len_test_iter.append(seq_len_test[i:i+batch_size])
else:
seq_len_test_iter.append(seq_len_test[i:])
##% fit model
def fit(epochs, model, loss_func, optimizer, train_dl, valid_dl, valid_ds, seq_len_train_iter, seq_len_valid_iter):
train_loss_record = []
valid_loss_record = []
mean_pct_final = []
mean_abs_final = []
is_better = False
last_epoch_abs_error = 0
last_epoch_pct_error = 0
mean_pct_final_train = []
mean_abs_final_train = []
for epoch in range(epochs):
# seq_len_crt: current batch seq len
for batches, ((xb, yb), seq_len_crt) in enumerate(zip(train_dl, seq_len_train_iter)):
if isinstance(seq_len_crt, np.int64):
seq_len_crt = [seq_len_crt]
y_hat = model(xb, seq_len_crt)
packed_yb = nn.utils.rnn.pack_padded_sequence(yb, seq_len_crt, batch_first=True, enforce_sorted=False)
final_yb, input_sizes = nn.utils.rnn.pad_packed_sequence(packed_yb)
final_yb = final_yb.permute(1, 0)
# assert torch.all(torch.tensor(seq_len_crt).eq(input_sizes))
loss = loss_func(y_hat, final_yb)
batch_size_crt = final_yb.shape[0]
loss = (loss.sum(-1) / input_sizes).sum() / batch_size_crt
loss.backward()
optimizer.step()
# scheduler.step()
optimizer.zero_grad()
# print(i)
with torch.no_grad():
train_loss_record.append(loss.item())
if batches % 50 == 0 and epoch % 1 == 0:
# print(f'Epoch {epoch}, batch {i} training loss: {loss.item()}')
y_hat = predict(xb[0], model, torch.tensor([seq_len_crt[0]])).detach().numpy().squeeze() # xb[0].shape(34,5)
label = yb[0][:len(y_hat)]
# plt.ion()
plt.plot(y_hat, label='predicted')
plt.plot(label, label='label')
plt.legend(loc='upper right')
plt.title('training mode')
plt.text(len(y_hat)+1, max(y_hat.max(), label.max()), f'Epoch {epoch}, batch {batches} training loss: {loss.item()}')
plt.show()
return train_loss_record
but I met:Expected more than 1 value per channel when training, got input size torch.Size([1, 60])
the error message is:
ValueError Traceback (most recent call last)
<ipython-input-119-fb062ad3f20e> in <module>
----> 1 fit(500, model, loss_func, optimizer, train_dl, valid_dl, valid_ds, seq_len_train_iter, seq_len_valid_iter)
<ipython-input-118-2eb946c379bf> in fit(epochs, model, loss_func, optimizer, train_dl, valid_dl, valid_ds, seq_len_train_iter, seq_len_valid_iter)
38 # print(f'Epoch {epoch}, batch {i} training loss: {loss.item()}')
39
---> 40 y_hat = predict(xb[0], model, torch.tensor([seq_len_crt[0]])).detach().numpy().squeeze() # xb[0].shape(34,5)
41 label = yb[0][:len(y_hat)]
42 # plt.ion()
<ipython-input-116-28afce77e325> in predict(xb, model, seq_len)
7 if not isinstance(xb, torch.Tensor):
8 xb = torch.Tensor(xb)
----> 9 return model(xb, seq_len) # xb.shape(None,34,5)
D:\Anaconda3\envs\LSTM\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
<ipython-input-114-3e9c30d20ed6> in forward(self, xb, seq_len_crt)
50 out = self.linear1(xb[:, :, i]) # xb[:,:,i].shape:(None,34), out.shape(None,100)
51 out = F.relu(out) # 输入:out.shape(None,100), 输出:out.shape(None,100)
---> 52 out = self.BN1(out) # 输入:out.shape(None,100),输出:out.shape(None,100)
53
54 out = self.linear2(out)
D:\Anaconda3\envs\LSTM\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
D:\Anaconda3\envs\LSTM\lib\site-packages\torch\nn\modules\batchnorm.py in forward(self, input)
129 used for normalization (i.e. in eval mode when buffers are not None).
130 """
--> 131 return F.batch_norm(
132 input,
133 # If buffers are not to be tracked, ensure that they won't be updated
D:\Anaconda3\envs\LSTM\lib\site-packages\torch\nn\functional.py in batch_norm(input, running_mean, running_var, weight, bias, training, momentum, eps)
2052 bias=bias, training=training, momentum=momentum, eps=eps)
2053 if training:
-> 2054 _verify_batch_size(input.size())
2055
2056 return torch.batch_norm(
D:\Anaconda3\envs\LSTM\lib\site-packages\torch\nn\functional.py in _verify_batch_size(size)
2035 size_prods *= size[i + 2]
2036 if size_prods == 1:
-> 2037 raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
2038
2039
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 60])
I have checked and I found that in out = self.BN1(out),out.shape = (1,60),it seems that batchsize=1 is not permitted in BatchNorm1d .But I don't know how to modify it.
what does BatchNorm1d do mathematically?
try and write down the equation for the case of batch_size=1 and you'll understand why pytorch is angry with you.
How to solve it?
It is simple: BatchNorm has two "modes of operation": one is for training where it estimates the current batch's mean and variance (this is why you must have batch_size>1 for training).
The other "mode" is for evaluation: it uses accumulated mean and variance to normalize new inputs without re-estimating the mean and variance. In this mode there is no problem processing samples one by one.
When evaluating your model use model.eval() before and model.train() after.
I met this problem when I load the model and started to test. Add the model.eval() before you fill in your data. This can solve the problem.
If you are using the DataLoader class, sometimes the last batch in an epoch will have only a single training example (imagine a training set of 33 examples with a batch size of 32). This can trigger the error if the network is in training mode and a batch norm layer is present.
Set the drop_last argument in the DataLoader to True like:
from torch.utils.data import DataLoader
...
trainloader = DataLoader(train_dataset, batch_size=32, shuffle=True, drop_last=True)
to discard the last incomplete batch in each epoch.

Removing last 2 layers from a BERT classifier results in " 'tuple' object has no attribute 'dim' " error. Why?

I fine tuned a huggingface transformer using Keras (with ktrain) and then reloaded the model in Pytorch.
I want to access the third to last layer (pre_classifier), so I removed the two last layers:
BERT2 = torch.nn.Sequential(*(list(BERT.children())[:-2]))
Running an encoded sentence through this yields the following error message:
AttributeError Traceback (most recent call last)
<ipython-input-38-640702475573> in <module>
----> 1 ans2=BERT2(torch.tensor([e1]))
2 print (ans2)
C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
539 result = self._slow_forward(*input, **kwargs)
540 else:
--> 541 result = self.forward(*input, **kwargs)
542 for hook in self._forward_hooks.values():
543 hook_result = hook(self, input, result)
C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\container.py in forward(self, input)
90 def forward(self, input):
91 for module in self._modules.values():
---> 92 input = module(input)
93 return input
94
C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
539 result = self._slow_forward(*input, **kwargs)
540 else:
--> 541 result = self.forward(*input, **kwargs)
542 for hook in self._forward_hooks.values():
543 hook_result = hook(self, input, result)
C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\linear.py in forward(self, input)
85
86 def forward(self, input):
---> 87 return F.linear(input, self.weight, self.bias)
88
89 def extra_repr(self):
C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\functional.py in linear(input, weight, bias)
1366 - Output: :math:`(N, *, out\_features)`
1367 """
-> 1368 if input.dim() == 2 and bias is not None:
1369 # fused op is marginally faster
1370 ret = torch.addmm(bias, input, weight.t())
AttributeError: 'tuple' object has no attribute 'dim'
Meanwhile deleting the classifier entirely (all three layers)
BERT3 = torch.nn.Sequential(*(list(BERT.children())[:-3]))
Yields the expected tensor (within a size 1 tuple) with the expected shape ([sentence_num,token_num,768]).
Why does the removal of two (but not three) layers breaks the model?
And how can I access the pre_classifier results?
It is not accessible by setting config with output_hidden_states=True as this flag returns the hidden values of the BERT transformer stack, not those of the classifier layers downstream to it.
--
PS
The code used to initialize the BERT model:
def collect_data_for_FT():
from sklearn.datasets import fetch_20newsgroups
train_data = fetch_20newsgroups(subset='train', shuffle=True, random_state=42)
test_data = fetch_20newsgroups(subset='test', shuffle=True, random_state=42)
print('size of training set: %s' % (len(train_b['data'])))
print('size of validation set: %s' % (len(test_b['data'])))
print('classes: %s' % (train_b.target_names))
x_train = train_data.data
y_train = train_data.target
x_test = test_data.data
y_test = test_data.target
return(x_train,y_train,x_test,y_test)
bert_name = 'distilbert-base-uncased'
from transformers import DistilBertForSequenceClassification,AutoConfig,AutoTokenizer
import os
dir_path = os.getcwd()
dir_path=os.path.join(dir_path,'models')
config = AutoConfig.from_pretrained(bert_name,num_labels=20) # change model configuration to access hidden values.
try:
BERT = DistilBertForSequenceClassification.from_pretrained(dir_path,config=config)
print ("Finetuned predictor loaded")
except:
import tensorflow.keras as keras
print ("No finetuned predictor found.\nTraining.")
(x_train,y_train,x_test,y_test)=collect_data_for_FT()
####
# prework:
import ktrain
from ktrain import text
t = text.Transformer(bert_name, maxlen=500, classes=train_b.target_names)
trn = t.preprocess_train(x_train, y_train)
val = t.preprocess_test(x_test, y_test)
pre_trained_model = t.get_classifier()
learner = ktrain.get_learner(pre_trained_model, train_data=trn, val_data=val, batch_size=6)
####
####
# Find best learning rate
learner.lr_find()
learner.lr_plot()
####
learner.fit_onecycle(2e-4, 4) # choosen based on the learning rate/loss plot.
####
# prepare and save:
predictor = ktrain.get_predictor(learner.model, preproc=t)
predictor.save('my_distilbertbase_predictor')
predictor.model.save_pretrained(dir_path)
####
BERT = DistilBertForSequenceClassification.from_pretrained(os.path.join(dir_path), from_tf=True,config=config) # re-load tensorflow to pytorch
BERT.save_pretrained(dir_path) # save as a "full blooded" pytorch model
BERT = DistilBertForSequenceClassification.from_pretrained(dir_path,config=config) # re-load
from tensorflow.keras import backend as K
K.clear_session() # loading from tensorflow takes up space and the GPU. This releases it/

How to fix 'Expected object of scalar type Float but got scalar type Double for argument #4 'mat1''?

I am trying to build an lstm model. My model code is below.
My input has 4 features, Sequence length of 5 and batch size of 32.
class RNN(nn.Module):
def __init__(self, feature_dim, output_size, hidden_dim, n_layers, dropout=0.5):
"""
Initialize the PyTorch RNN Module
:param feature_dim: The number of input dimensions of the neural network
:param output_size: The number of output dimensions of the neural network
:param hidden_dim: The size of the hidden layer outputs
:param dropout: dropout to add in between LSTM/GRU layers
"""
super(RNN, self).__init__()
# set class variables
self.output_size = output_size
self.n_layers = n_layers
self.hidden_dim = hidden_dim
# define model layers
self.lstm = nn.LSTM(feature_dim, hidden_dim, n_layers, batch_first=True)
self.fc = nn.Linear(hidden_dim, output_size)
self.dropout = nn.Dropout(dropout)
def forward(self, nn_input, hidden):
"""
Forward propagation of the neural network
:param nn_input: The input to the neural network
:param hidden: The hidden state
:return: Two Tensors, the output of the neural network and the latest hidden state
"""
# Get Batch Size
batch_size = nn_input.size(0)
# Pass through LSTM layer
lstm_out, hidden = self.lstm(nn_input, hidden)
# Stack up LSTM outputs
lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)
# Add dropout and pass through fully connected layer
x = self.dropout(lstm_out)
x = self.fc(lstm_out)
# reshape to be batch_size first
output = x.view(batch_size, -1, self.output_size)
# get last batch of labels
out = output[:, -1]
# return one batch of output word scores and the hidden state
return out, hidden
def init_hidden(self, batch_size):
'''
Initialize the hidden state of an LSTM/GRU
:param batch_size: The batch_size of the hidden state
:return: hidden state of dims (n_layers, batch_size, hidden_dim)
'''
# Implement function
# initialize state with zero weights, and move to GPU if available
weight = next(self.parameters()).data
if is_gpu_available:
hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().to(device),
weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().to(device))
else:
hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
return hidden
When I train, I got the error
RuntimeError Traceback (most recent call last)
/usr/local/bin/kernel-launchers/python/scripts/launch_ipykernel.py in <module>
3
4 # training the model
----> 5 trained_rnn = train_rnn(rnn, batch_size, optimizer, num_epochs, show_every_n_batches)
6
7 # saving the trained model
/usr/local/bin/kernel-launchers/python/scripts/launch_ipykernel.py in train_rnn(rnn, batch_size, optimizer, n_epochs, show_every_n_batches)
18
19 # forward, back prop
---> 20 loss, hidden = forward_back_prop(rnn, optimizer, inputs, labels, hidden)
21 # record loss
22 batch_losses.append(loss)
/usr/local/bin/kernel-launchers/python/scripts/launch_ipykernel.py in forward_back_prop(rnn, optimizer, inp, target, hidden)
22
23 # get the output from the model
---> 24 output, h = rnn(inp, h)
25
26 # calculate the loss and perform backprop
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
--> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)
/usr/local/bin/kernel-launchers/python/scripts/launch_ipykernel.py in forward(self, nn_input, hidden)
36
37 # Pass through LSTM layer
---> 38 lstm_out, hidden = self.lstm(nn_input, hidden)
39 # Stack up LSTM outputs
40 lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
--> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
557 return self.forward_packed(input, hx)
558 else:
--> 559 return self.forward_tensor(input, hx)
560
561
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/rnn.py in forward_tensor(self, input, hx)
537 unsorted_indices = None
538
--> 539 output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices)
540
541 return output, self.permute_hidden(hidden, unsorted_indices)
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/rnn.py in forward_impl(self, input, hx, batch_sizes, max_batch_size, sorted_indices)
520 if batch_sizes is None:
521 result = _VF.lstm(input, hx, self._get_flat_weights(), self.bias, self.num_layers,
--> 522 self.dropout, self.training, self.bidirectional, self.batch_first)
523 else:
524 result = _VF.lstm(input, batch_sizes, hx, self._get_flat_weights(), self.bias,
RuntimeError: Expected object of scalar type Float but got scalar type Double for argument #4 'mat1'
I am not able to figure the cause of this error. How to fix it? Please help.
Also, is it the correct way of implementing the LSTM or is there a better way to achieve the same?
torch.nn.LSTM does not need any initialization, as it's initialized to zeros by default (see documentation).
Furthermore, torch.nn.Module already has predefined cuda() method, so one can move module to GPU simply, hence you can safely delete init_hidden(self, batch_size).
You have this error because your input is of type torch.Double, while modules by default use torch.Float (as it's accurate enough, faster and smaller than torch.Double).
You can cast your input Tensors by calling .float(), in your case it could look like that:
def forward(self, nn_input, hidden):
nn_input = nn_input.float()
... # rest of your code
Finally, there is no need for hidden argument if it's always zeroes, you can simply use:
lstm_out, hidden = self.lstm(nn_input) # no hidden here
as hidden is zeroes by default as well.

Resources