PyTorch: Fully Connected Layer has no Parameters

PyTorch: Fully Connected Layer has no Parameters - pytorch

I am very new at PyTorch so please excuse my ignorance. I am trying to create my own CNN using PyTorch. The problem is that my Fully Connected Layer (i.e. the LazyLinear function) is showing no learnable parameters and the network is obviously not learning anything.
import torch
from torch import nn
import pytorch_lightning as pl
from pytorch_lightning.core.decorators import auto_move_data
class ThreeConvLayer(pl.LightningModule):
def __init__(self, num_classes, num_images):
super(ThreeConvLayer, self).__init__()
self.convolutionlayer1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, padding=1)
self.BatchNormalization1 = nn.BatchNorm2d(16)
self.ReLU1 = nn.ReLU()
self.maxpool1 = nn.MaxPool2d(kernel_size = 2, stride = 2)
self.convolutionlayer2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1)
self.BatchNormalization2 = nn.BatchNorm2d(32)
self.ReLU2 = nn.ReLU()
self.maxpool2 = nn.MaxPool2d(kernel_size = 2, stride=2)
self.convolutionlayer3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1)
self.BatchNormalization3 = nn.BatchNorm2d(64)
self.ReLU3 = nn.ReLU()
self.fc1 = nn.LazyLinear(num_classes)
self.softmax = nn.Softmax(dim=1)
self.loss = nn.CrossEntropyLoss()
def forward(self, x):
#print('shape x: ', x.shape)
out = self.convolutionlayer1(x)
out = self.BatchNormalization1(out)
out = self.ReLU1(out)
out = self.maxpool1(out)
# print('shape out1: ', out.shape)
out = self.convolutionlayer2(out)
out = self.BatchNormalization2(out)
out = self.ReLU2(out)
out = self.maxpool2(out)
#print('shape out2: ', out.shape)
out = self.convolutionlayer3(out)
out = self.BatchNormalization3(out)
out = self.ReLU3(out)
#print('shape out3: ', out.shape)
out = out.reshape(out.size(0), -1)
#print('out after reshpae: ', out.shape)
out = self.fc1(out)
out = self.softmax(out)
#print('final out: ', out.shape)
return out
def training_step(self, batch, batch_no):
# implement single training step
x, y = batch
y = y.long()
logits = self(x)
loss = self.loss(logits, y)
self.log('val_loss', loss)
return loss
def configure_optimizers(self):
# choose your optimizer
return torch.optim.RMSprop(self.parameters(), lr=0.05) #lr=0.005
Here is the PyTorch printout on learnable parameters:
| Name | Type | Params
----------------------------------------------------------
0 | convolutionlayer1 | Conv2d | 160
1 | BatchNormalization1 | BatchNorm2d | 32
2 | ReLU1 | ReLU | 0
3 | maxpool1 | MaxPool2d | 0
4 | convolutionlayer2 | Conv2d | 4.6 K
5 | BatchNormalization2 | BatchNorm2d | 64
6 | ReLU2 | ReLU | 0
7 | maxpool2 | MaxPool2d | 0
8 | convolutionlayer3 | Conv2d | 18.5 K
9 | BatchNormalization3 | BatchNorm2d | 128
10 | ReLU3 | ReLU | 0
11 | fc1 | LazyLinear | 0
12 | softmax | Softmax | 0
13 | loss | CrossEntropyLoss | 0
----------------------------------------------------------
Here you can see that the FullyConnected Layer has no parameters which is clearly wrong. What did I do wrong here?

This is one limitation of lazy modules such as LazyLinear. If you read through the documentation of nn.modules.lazy.LazyModuleMixin, you will see:
Modules that lazily initialize parameters, or “lazy modules”, derive the shapes of their parameters from the first input(s) to their forward method. Until that first forward they contain torch.nn.UninitializedParameters that should not be accessed or used, and afterward they contain regular torch.nn.Parameters.
In other words, you need to first perform a dry run with random data before you can proceed with your calls. This is required because lazy modules infer the missing arguments (such as in_features for nn.Linear) after the first inference call, i.e. based on the shape of the first input it receives.
getting the number of parameters registered.
properly register them inside an optimizer.

Related

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x1792 and 2048x1) in Google Colab in Windows (Python code)

I am trying to train a neural network to predict one value using 3D images of different cases as input. According to the configuration parameters, the size of input images that I pass to the neural network is (8,1,96,96,96) and the output is a scalar value.
When I run this cell...
# Init model
model = BrainAgeCNN().to(config.device)
config.lr = 0.01
config.betas = (0.9, 0.999)
config.num_steps = 1400
# Init optimizers
optimizer = torch.optim.AdamW(
model.parameters(),
lr=config.lr,
betas=config.betas
)
# Init tensorboard
writer = TensorboardLogger(config.log_dir, config)
# Train
model, step = train(
config=config,
model=model,
optimizer=optimizer,
train_loader=dataloaders['train'],
val_loader=dataloaders['val'],
writer=writer
)
This is the error that I obtain just at the end of the training, but during the training I do not get any error:
Training: 0%| | 0/50 [00:00<?, ?it/s]
Training: 2%|▏ | 1/50 [00:00<00:16, 2.89it/s]
Training: 4%|▍ | 2/50 [00:00<00:17, 2.79it/s]
Training: 6%|▌ | 3/50 [00:00<00:14, 3.33it/s]
Training: 8%|▊ | 4/50 [00:01<00:12, 3.67it/s]
Training: 10%|█ | 5/50 [00:01<00:11, 3.87it/s]
Training: 12%|█▏ | 6/50 [00:01<00:10, 4.02it/s]
Training: 14%|█▍ | 7/50 [00:01<00:10, 4.12it/s]
Training: 16%|█▌ | 8/50 [00:02<00:10, 4.15it/s]
Training: 18%|█▊ | 9/50 [00:02<00:09, 4.21it/s]
Training: 20%|██ | 10/50 [00:02<00:09, 4.23it/s]
Training: 22%|██▏ | 11/50 [00:02<00:09, 4.29it/s]
Training: 24%|██▍ | 12/50 [00:03<00:08, 4.26it/s]
Training: 26%|██▌ | 13/50 [00:03<00:08, 4.30it/s]
Training: 28%|██▊ | 14/50 [00:03<00:08, 4.33it/s]
Training: 30%|███ | 15/50 [00:03<00:08, 4.34it/s]
Training: 32%|███▏ | 16/50 [00:03<00:07, 4.30it/s]
Training: 34%|███▍ | 17/50 [00:04<00:07, 4.30it/s]
Training: 36%|███▌ | 18/50 [00:04<00:07, 4.31it/s]
Training: 38%|███▊ | 19/50 [00:04<00:07, 4.33it/s]
Training: 40%|████ | 20/50 [00:04<00:06, 4.33it/s]
Training: 42%|████▏ | 21/50 [00:05<00:06, 4.35it/s]
Training: 44%|████▍ | 22/50 [00:05<00:06, 4.34it/s]
Training: 46%|████▌ | 23/50 [00:05<00:06, 4.36it/s]
Training: 48%|████▊ | 24/50 [00:05<00:05, 4.37it/s]
Training: 50%|█████ | 25/50 [00:06<00:05, 4.37it/s]
Training: 52%|█████▏ | 26/50 [00:06<00:05, 4.36it/s]
Training: 54%|█████▍ | 27/50 [00:06<00:05, 4.38it/s]
Training: 56%|█████▌ | 28/50 [00:06<00:05, 4.36it/s]
Training: 58%|█████▊ | 29/50 [00:06<00:04, 4.34it/s]
Training: 60%|██████ | 30/50 [00:07<00:04, 4.35it/s]
Training: 62%|██████▏ | 31/50 [00:07<00:04, 4.34it/s]
Training: 64%|██████▍ | 32/50 [00:07<00:04, 4.32it/s]
Training: 66%|██████▌ | 33/50 [00:07<00:03, 4.29it/s]
Training: 68%|██████▊ | 34/50 [00:08<00:03, 4.23it/s]
Training: 70%|███████ | 35/50 [00:08<00:03, 4.26it/s]
Training: 72%|███████▏ | 36/50 [00:08<00:03, 4.25it/s]
Training: 74%|███████▍ | 37/50 [00:08<00:03, 4.25it/s]
Training: 76%|███████▌ | 38/50 [00:09<00:02, 4.27it/s]
Training: 78%|███████▊ | 39/50 [00:09<00:02, 4.25it/s]
Training: 80%|████████ | 40/50 [00:09<00:02, 4.22it/s]
Training: 82%|████████▏ | 41/50 [00:09<00:02, 4.27it/s]
Training: 84%|████████▍ | 42/50 [00:09<00:01, 4.24it/s]
Training: 86%|████████▌ | 43/50 [00:10<00:01, 4.25it/s]
Training: 88%|████████▊ | 44/50 [00:10<00:01, 4.25it/s]
Training: 90%|█████████ | 45/50 [00:10<00:01, 4.27it/s]
Training: 92%|█████████▏| 46/50 [00:10<00:00, 4.27it/s]
Training: 94%|█████████▍| 47/50 [00:11<00:00, 4.27it/s]
Training: 96%|█████████▌| 48/50 [00:11<00:00, 4.28it/s]
Training: 98%|█████████▊| 49/50 [00:11<00:00, 4.27it/s]
Training: 100%|██████████| 50/50 [00:11<00:00, 4.25it/s]
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-59-ba791e9bf3a2> in <module>
14 train_loader=dataloaders['train'],
15 val_loader=dataloaders['val'],
---> 16 writer=writer
17 )
5 frames
<ipython-input-29-98abf7b06208> in train(config, model, optimizer, train_loader, val_loader, writer)
41 model,
42 val_loader,
---> 43 config,
44 )
45
<ipython-input-29-98abf7b06208> in validate(model, val_loader, config, show_plot)
76
77 with torch.no_grad(): # Context-manager that disabled gradient calculation
---> 78 loss, pred = model.train_step(x, y, return_prediction=True)
79 avg_val_loss.add(loss.item())
80 preds.append(pred.cpu())
/content/ai-in-medicine-practical-session1/models.py in train_step(self, imgs, labels, return_prediction)
112 :return pred
113 """
--> 114 pred = torch.squeeze(self.forward(imgs.float())) # (N)
115
116 # ----------------------- ADD YOUR CODE HERE --------------------------
/content/ai-in-medicine-practical-session1/models.py in forward(self, imgs)
93
94 x = x.view(-1, x.shape[0]*x.shape[1]*x.shape[2]*x.shape[3]*x.shape[4])
---> 95 pred = self.relu1_5(self.fc1(x))
96
97 # ------------------------------- END ---------------------------------
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = [], []
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py in forward(self, input)
112
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)
115
116 def extra_repr(self) -> str:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x1792 and 2048x1)
According to what I have seen in the parameters of the model. This model supposedly trains well and it seems that I set the batch size to 8. However, at the end of training this value changes to 7 (I do not know why), and gives me the error above.
This is the function for training:
def train(config, model, optimizer, train_loader, val_loader, writer):
model.train()
step = 0
pbar = tqdm(total=config.val_freq,
desc=f'Training') # Progress bar
avg_loss = AvgMeter() # Computes and stores the average and current value.
while True:
for x, y in train_loader:
x = x.to(config.device)
y = y.to(config.device)
pbar.update(1) # Update progress bar 1 value
# Training step
optimizer.zero_grad() # Sets the gradients of all optimized torch.Tensor s to zero
loss = model.train_step(x, y) # Calculate the loss
loss.backward() # Computes dloss/dx for every parameter x which has requires_grad=True (x.grad += dloss/dx)
optimizer.step() # Updates the value of x using the gradient x.grad (x += -lr * x.grad)
# optimizer.zero_grad() clears x.grad for every parameter x in the optimizer. It’s important to call this before loss.backward(),
# otherwise you’ll accumulate the gradients from multiple passes.
avg_loss.add(loss.detach().item())
# .detach() will return a tensor, which is detached from the computation graph, while .item() will return the Python scalar
# Increment step
step += 1
if step % config.log_freq == 0 and not step % config.val_freq == 0:
train_loss = avg_loss.compute()
writer.log({'train/loss': train_loss}, step=step)
# Validate and log at validation frequency
if step % config.val_freq == 0:
# Reset avg_loss
train_loss = avg_loss.compute()
avg_loss = AvgMeter()
# Get validation results
val_results = validate(
model,
val_loader,
config,
)
# Print current performance
print(f"Finished step {step} of {config.num_steps}. "
f"Train loss: {train_loss} - "
f"val loss: {val_results['val/loss']:.4f} - "
f"val MAE: {val_results['val/MAE']:.4f}")
# Write to tensorboard
writer.log(val_results, step=step)
# Reset progress bar
pbar = tqdm(total=config.val_freq, desc='Training')
if step >= config.num_steps:
print(f'\nFinished training after {step} steps\n')
return model, step
def validate(model, val_loader, config, show_plot=False):
model.eval()
# model.eval() is a kind of switch for some specific layers/parts of the model that behave differently during training
# and inference (evaluating) time. For example, Dropouts Layers, BatchNorm Layers etc. You need to turn off them during model
# evaluation, and .eval() will do it for you. In addition, the common practice for evaluating/validation is using torch.no_grad()
# in pair with model.eval() to turn off gradients computation
avg_val_loss = AvgMeter()
preds = []
targets = []
for x, y in val_loader:
x = x.to(config.device)
y = y.to(config.device)
with torch.no_grad(): # Context-manager that disabled gradient calculation
loss, pred = model.train_step(x, y, return_prediction=True)
avg_val_loss.add(loss.item())
preds.append(pred.cpu())
targets.append(y.cpu())
# torch.cat() Concatenates the given sequence of seq tensors in the given dimension
# All tensors must either have the same shape (except in the concatenating dimension) or be empty
preds = torch.cat(preds)
targets = torch.cat(targets)
mae = mean_absolute_error(preds, targets)
f = plot_results(preds, targets, show_plot)
model.train()
return {
'val/loss': avg_val_loss.compute(),
'val/MAE': mae,
'val/MAE_plot': f
}
def plot_results(preds: Tensor, targets: Tensor, show_plot: bool = False):
# Compute the mean absolute error
mae_test = mean_absolute_error(preds, targets)
# Sort preds and targets to ascending targets
sort_inds = targets.argsort() # It returns an array of indices along the given axis of the same shape as the input array, in sorted order
targets = targets[sort_inds].numpy() # Converts a tensor object into an numpy.ndarray object
preds = preds.view(targets.shape)
preds = preds[sort_inds].numpy() # Converts a tensor object into an numpy.ndarray object
f = plt.figure()
plt.plot(targets, targets, 'r.')
plt.plot(targets, preds, '.')
plt.plot(targets, targets + mae_test, 'gray')
plt.plot(targets, targets - mae_test, 'gray')
plt.suptitle('Mean Average Error')
plt.xlabel('True Age')
plt.ylabel('Age predicted')
if show_plot:
plt.show()
return f
This is the neural network that I am using for training. It is a neural network with 3D convolutions, batch normalizations, ReLU() and fully connected layers at the end.
from typing import Optional
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import Tensor
class BrainAgeCNN(nn.Module):
"""
The BrainAgeCNN predicts the age given a brain MR-image.
"""
def __init__(self) -> None:
super().__init__()
self.loss = torch.nn.MSELoss()
# Feel free to also add arguments to __init__ if you want.
# ----------------------- ADD YOUR CODE HERE --------------------------
self.conv1_1 = nn.Conv3d(in_channels = 1, out_channels = 4, kernel_size = 3, stride = 1, padding = 0)
self.relu1_1 = nn.ReLU()
self.conv2_1 = nn.Conv3d(in_channels = 4, out_channels = 4, kernel_size = 3, stride = 1, padding = 0)
self.bnn1_1 = nn.BatchNorm3d(num_features = 4)
self.relu2_1 = nn.ReLU()
self.maxp1_1 = nn.MaxPool3d(kernel_size = 2, stride=2, padding=0)
self.conv1_2 = nn.Conv3d(in_channels = 4, out_channels = 8, kernel_size = 3, stride = 1, padding = 0)
self.relu1_2 = nn.ReLU()
self.conv2_2 = nn.Conv3d(in_channels = 8, out_channels = 8, kernel_size = 3, stride = 1, padding = 0)
self.bnn1_2 = nn.BatchNorm3d(num_features = 8)
self.relu2_2 = nn.ReLU()
self.maxp1_2 = nn.MaxPool3d(kernel_size = 2, stride=2, padding=0)
self.conv1_3 = nn.Conv3d(in_channels = 8, out_channels = 16, kernel_size = 3, stride = 1, padding = 0)
self.relu1_3 = nn.ReLU()
self.conv2_3 = nn.Conv3d(in_channels = 16, out_channels = 16, kernel_size = 3, stride = 1, padding = 0)
self.bnn1_3 = nn.BatchNorm3d(num_features = 16)
self.relu2_3 = nn.ReLU()
self.maxp1_3 = nn.MaxPool3d(kernel_size = 2, stride=2, padding=0)
self.conv1_4 = nn.Conv3d(in_channels = 16, out_channels = 32, kernel_size = 3, stride = 1, padding = 0)
self.relu1_4 = nn.ReLU()
self.conv2_4 = nn.Conv3d(in_channels = 32, out_channels = 32, kernel_size = 3, stride = 1, padding = 0)
self.bnn1_4 = nn.BatchNorm3d(num_features = 32)
self.relu2_4 = nn.ReLU()
self.maxp1_4 = nn.MaxPool3d(kernel_size = 2, stride=2, padding=0)
self.fc1 = nn.Linear(2048, 1)
self.relu1_5 = nn.ReLU()
# ------------------------------- END ---------------------------------
def forward(self, imgs: Tensor) -> Tensor:
"""
Forward pass of your model.
:param imgs: Batch of input images. Shape (N, 1, H, W, D)
:return pred: Batch of predicted ages. Shape (N)
"""
# ----------------------- ADD YOUR CODE HERE --------------------------
x = self.relu1_1(self.conv1_1(imgs))
x = self.maxp1_1(self.relu2_1(self.bnn1_1(self.conv2_1(x))))
x = self.relu1_2(self.conv1_2(x))
x = self.maxp1_2(self.relu2_2(self.bnn1_2(self.conv2_2(x))))
x = self.relu1_3(self.conv1_3(x))
x = self.maxp1_3(self.relu2_3(self.bnn1_3(self.conv2_3(x))))
x = self.relu1_4(self.conv1_4(x))
x = self.maxp1_4(self.relu2_4(self.bnn1_4(self.conv2_4(x))))
x = x.view(-1, x.shape[0]*x.shape[1]*x.shape[2]*x.shape[3]*x.shape[4])
pred = self.relu1_5(self.fc1(x))
# ------------------------------- END ---------------------------------
return pred
def train_step(
self,
imgs: Tensor,
labels: Tensor,
return_prediction: Optional[bool] = False
):
"""Perform a training step. Predict the age for a batch of images and
return the loss.
:param imgs: Batch of input images (N, 1, H, W, D)
:param labels: Batch of target labels (N)
:return loss: The current loss, a single scalar.
:return pred
"""
pred = torch.squeeze(self.forward(imgs.float())) # (N)
# ----------------------- ADD YOUR CODE HERE --------------------------
loss = self.loss(labels.float(), pred)
# ------------------------------- END ---------------------------------
if return_prediction:
return loss, pred
else:
return loss
Any help you can give me will be welcomed.
I tried changing the batch size, but it gives the same error but with other values of matrix multiplication.
I am expecting a training in Google Colab of the neural network: getting a scalar value from a input image of size (8,1,96,96,96), but without any error.

From error it looks like that validate inside train has the error. and This means that you should look at your dataloader (you have two, train and validate dataloader dataloaders['train'] and ['val']), if you are using the batch for both, maybe the size of data for your validation set is not multiple of batchsize and last batch is not complete. you could use drop_last in your DataLoader to ignore last one.
https://pytorch.org/docs/stable/data.html

Pytorch lightning: see input/ouptut size in model summary when using nn.ModuleList

When I use nn.ModuleList() to define layers in my pytorch lightning model, their "In sizes" and "Out sizes" in ModelSummary are "?"
Is their a way to have input/output sizes of layer in model summary, eventually using something else than nn.ModuleList() to define layers from a list of arguments.
Here is a dummy model:
(12 is the batch size)
import torch
import torch.nn as nn
from pytorch_lightning import LightningModule
from pytorch_lightning.utilities.model_summary import ModelSummary
class module_list_dummy(LightningModule):
def __init__(self,
layers_size_list,
):
super().__init__()
self.example_input_array = torch.zeros((12,100), dtype=torch.float32)
self.fc11 = nn.Linear(100,50)
self.moduleList = nn.ModuleList()
input_size = 50
for layer_size in layers_size_list:
self.moduleList.append(nn.Linear(input_size, layer_size))
input_size = layer_size
self.loss_fn = nn.MSELoss()
def forward(self, x):
out = self.fc11(x)
for layer in self.moduleList:
out = layer(out)
return out
def training_step(self, batch, batch_idx):
x, y = batch
out = self.fc11(X)
for layer in self.moduleList:
out = layer(out)
loss = torch.sqrt(self.loss_fn(out, y))
self.log('train_loss', loss)
return loss
def validation_step(self, batch, batch_idx):
x, y = batch
out = self.fc11(X)
for layer in self.moduleList:
out = layer(out)
loss = torch.sqrt(self.loss_fn(y_hat, y))
self.log('val_loss', loss)
return loss
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=self.lr)
return optimizer
the following code print the summary:
net = module_list_dummy(layers_size_list=[30,20,1])
summary = ModelSummary(net)
print(summary)
But the output is:
| Name | Type | Params | In sizes | Out sizes
------------------------------------------------------------------
0 | fc11 | Linear | 5.0 K | [12, 100] | [12, 50]
1 | moduleList | ModuleList | 2.2 K | ? | ?
2 | loss_fn | MSELoss | 0 | ? | ?
------------------------------------------------------------------
7.2 K Trainable params
0 Non-trainable params
7.2 K Total params
0.029 Total estimated model params size (MB)
I would expect to have:
1 | moduleList | ModuleList | 2.2 K | [12,50] | [12,1]
or even better something like
1 | Linear | Linear | ... | [12,50] | [12,30]
2 | Linear | Linear | ... | [12,30] | [12,20]
3 | Linear | Linear | ... | [12,20] | [12,1]
I did this to check that those layer are being used in forward:
x = torch.zeros((12,100), dtype=torch.float32)
net(x).shape
and they are ( model output is size [12,1])

In your case you are using the layers in the list in a sequential manner:
for layer in self.moduleList:
out = layer(out)
However, nn.ModuleList does not force you to run your layers sequentially. You could be doing this:
out = self.moduleList[3](out)
out = self.moduleList[1](out)
out = self.moduleList[0](out)
Hence, ModuleList cannot itself be interpreted as a "layer" with an input- and output shape. It is completely arbitrary how the user might integrate the layers inside the list in their graph.
In your specific case where you run the layers sequentially, you could define a nn.Sequential instead. This would look like this:
self.layers = nn.Sequential()
for layer_size in layers_size_list:
self.layers.append(nn.Linear(input_size, layer_size))
input_size = layer_size
And then in your forward:
out = self.layers(input)
Now, the model summary in Lightning will also show the shape sizes, because nn.Sequential's output shape is now directly the output shape of the last layer inside of it.
I hope this answer helps your understanding.

KL Divergence goes NaN on Bayesian Convolutional Neural Network

I'm trying to implement a Bayesian Convolutional Neural Network using Pytorch on Python 3.7. I mainly orient myself on Shridhar's implementation. When running my CNN with normalized and MNIST data, the KL Divergence is NaN after a couple of iterations. I already implemented linear layers the same way and they worked perfectly fine.
I normalized the data as follows:
train_loader = torch.utils.data.DataLoader(datasets.MNIST('./mnist', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])), batch_size=BATCH_SIZE, shuffle=True, **LOADER_KWARGS)
eval_loader = torch.utils.data.DataLoader(datasets.MNIST('./mnist', train=False, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])), batch_size=EVAL_BATCH_SIZE, shuffle=False, **LOADER_KWARGS)
My implementation of the Conv-Layer looks as follows:
class BayesianConv2d(nn.Module):
def __init__(self, in_channels, out_channels, prior_sigma, kernel_size, stride=1, padding=0, dilation=1, groups=1):
super().__init__()
self.in_channels = in_channels
self.out_channels = out_channels
self.normal = torch.distributions.Normal(0,1)
# conv-parameters
self.kernel_size = kernel_size
self.stride = stride
self.padding = padding
self.dilation = dilation
self.groups = groups
# Weight parameters
self.weight_mu = nn.Parameter(torch.Tensor(out_channels, in_channels, *self.kernel_size).uniform_(0, 0.1))
self.weight_rho = nn.Parameter(torch.Tensor(out_channels, in_channels, *self.kernel_size).uniform_(-3,0.1))
self.weight_sigma = 0
self.weight = 0
# Bias parameters
self.bias_mu = nn.Parameter(torch.Tensor(out_channels).uniform_(0, 0.1))
self.bias_rho = nn.Parameter(torch.Tensor(out_channels).uniform_(-3,0.1))
self.bias_sigma = 0
self.bias = 0
# prior
self.prior_sigma = prior_sigma
def forward(self, input, sample=False, calculate_log_probs=False):
# compute sigma out of rho: sigma = log(1+e^rho)
self.weight_sigma = torch.log1p(torch.exp(self.weight_rho))
self.bias_sigma = torch.log1p(torch.exp(self.bias_rho))
# sampling process -> use local reparameterization trick
activations_mu = F.conv2d(input.to(DEVICE), self.weight_mu, self.bias_mu, self.stride, self.padding, self.dilation, self.groups)
activations_sigma = torch.sqrt(1e-16 + F.conv2d((input**2).to(DEVICE), self.weight_sigma**2, self.bias_sigma**2, self.stride, self.padding, self.dilation, self.groups))
activation_epsilon = Variable(self.weight_mu.data.new(activations_sigma.size()).normal_(mean=0, std=1))
outputs = activations_mu + activations_sigma * activation_epsilon
if self.training or calculate_log_probs:
self.kl_div = 0.5 * ((2 * torch.log(self.prior_sigma / self.weight_sigma) - 1 + (self.weight_sigma / self.prior_sigma).pow(2) + ((0 - self.weight_mu) / self.prior_sigma).pow(2)).sum() \
+ (2 * torch.log(0.1 / self.bias_sigma) - 1 + (self.bias_sigma / 0.1).pow(2) + ((0 - self.bias_mu) / 0.1).pow(2)).sum())
return outputs
The implementation of the corresponding Conv-Net looks as follows:
class BayesianConvNetwork(nn.Module):
# Set up network by definining layers
def __init__(self):
super().__init__()
self.conv1 = layers.BayesianConv2d(1, 24, prior_sigma=0.1, kernel_size = (5,5), padding=2)
self.pool1 = nn.MaxPool2d(kernel_size=3,stride=2, padding=1)
self.conv2 = layers.BayesianConv2d(24, 48, prior_sigma=0.1, kernel_size = (5,5), padding=2)
self.pool2 = nn.MaxPool2d(kernel_size=3,stride=2, padding=1)
self.conv3 = layers.BayesianConv2d(48, 64, prior_sigma=0.1, kernel_size = (5,5), padding=2)
self.pool3 = nn.MaxPool2d(kernel_size=3,stride=2, padding=1)
self.fcl1 = layers.BayesianLinearWithLocalReparamTrick(4*4*64, 256, prior_sigma=0.1)
self.fcl2 = layers.BayesianLinearWithLocalReparamTrick(256, 10, prior_sigma=0.1)
# define forward function by assigning corresponding activation functions to layers
def forward(self, x, sample=False):
x = F.relu(self.conv1(x, sample))
x = self.pool1(x)
x = F.relu(self.conv2(x, sample))
x = self.pool2(x)
x = F.relu(self.conv3(x, sample))
x = self.pool3(x)
x = x.view(-1, 4*4*64)
x = F.relu(self.fcl1(x, sample))
x = F.log_softmax(self.fcl2(x, sample), dim=1)
return x
# summing up KL-divergences to obtain overall KL-divergence-value
def total_kl_div(self):
return (self.conv1.kl_div + self.conv2.kl_div + self.conv3.kl_div + self.fcl1.kl_div + self.fcl2.kl_div)
# sampling prediction: perform prediction for each of the "different networks" that result from the weight distributions
def sample_elbo(self, input, target, batch_idx, nmbr_batches, samples=SAMPLES):
outputs = torch.zeros(samples, target.shape[0], CLASSES).to(DEVICE)
kl_divs = torch.zeros(samples).to(DEVICE)
for i in range(samples): # sample through networks
outputs[i] = self(input, sample=True) # perform prediction
kl_divs[i] = self.total_kl_div() # calculate total kl_div of the network
kl_div = kl_divs.mean() # compute mean kl_div from all samples
negative_log_likelihood = F.nll_loss(outputs.mean(0), target, size_average=False)
loss = kl_weighting * kl_div + negative_log_likelihood
return loss
Has anyone faced the same issue or knows how to solve it?
Many thanks in advance!

I figured out that it appears to be an issue with the SGD-optimizer. Using Adam as optimizer solved the problem though I don't know the reason for that. If anyone has an answer on why it works with Adam but not with SGD, feel free to comment.

Why am I getting the error ValueError: Expected input batch_size (4) to match target batch_size (64)?

Why am I getting the error ValueError: Expected input batch_size (4) to match target batch_size (64)?
Is it something to do with an incorrect number of channels(?) in the first linear layer? In this example I have 128 *4 *4 as the channel.
I have tried looking online and on this site for the answer but I have not been able to find it. So, I asked here.
Here is the network:
class Net(nn.Module):
"""A representation of a convolutional neural network comprised of VGG blocks."""
def __init__(self, n_channels):
super(Net, self).__init__()
# VGG block 1
self.conv1 = nn.Conv2d(n_channels, 64, (3,3))
self.act1 = nn.ReLU()
self.pool1 = nn.MaxPool2d((2,2), stride=(2,2))
# VGG block 2
self.conv2 = nn.Conv2d(64, 64, (3,3))
self.act2 = nn.ReLU()
self.pool2 = nn.MaxPool2d((2,2), stride=(2,2))
# VGG block 3
self.conv3 = nn.Conv2d(64, 128, (3,3))
self.act3 = nn.ReLU()
self.pool3 = nn.MaxPool2d((2,2), stride=(2,2))
# Fully connected layer
self.f1 = nn.Linear(128 * 4 * 4, 1000)
self.act4 = nn.ReLU()
# Output layer
self.f2 = nn.Linear(1000, 10)
self.act5 = nn.Softmax(dim=1)
def forward(self, X):
"""This function forward propagates the input."""
# VGG block 1
X = self.conv1(X)
X = self.act1(X)
X = self.pool1(X)
# VGG block 2
X = self.conv2(X)
X = self.act2(X)
X = self.pool2(X)
# VGG block 3
X = self.conv3(X)
X = self.act3(X)
X = self.pool3(X)
# Flatten
X = X.view(-1, 128 * 4 * 4)
# Fully connected layer
X = self.f1(X)
X = self.act4(X)
# Output layer
X = self.f2(X)
X = self.act5(X)
return X
Here is the training loop:
def training_loop(
n_epochs,
optimizer,
model,
loss_fn,
train_loader):
for epoch in range(1, n_epochs + 1):
loss_train = 0.0
for i, (imgs, labels) in enumerate(train_loader):
outputs = model(imgs)
loss = loss_fn(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
loss_train += loss.item()
if epoch == 1 or epoch % 10 == 0:
print('{} Epoch {}, Training loss {}'.format(
datetime.datetime.now(),
epoch,
loss_train / len(train_loader)))

As nerveless_child said, your dimensions are off!
For the other folks who are reviewing / studying Neural Networks, more generally, you can calculate the output dimension of a single convolutional layer by
[(W−K+2P)/S]+1
where
W is the input volume - in your case you have not given us this
K is the Kernel size - in your case 2 == "filter"
P is the padding - in your case 2
S is the stride - in your case 3
Another, prettier formulation:

That's because you're getting the dimensions wrong. From the error and your comment, I take it that your input is of the shape (64, 1, 28, 28).
Now, the shape of X at X = self.pool3(X) is (64, 128, 1, 1), which you then reshaped on the next line to (4, 128 * 4 * 4).
Long story short, the output of your model is (4, 10) i.e batch_size (4), which you're comparing on this line loss = loss_fn(outputs, labels) with a tensor of batch_size (64) as the error said.
I don't know what you're trying to do but I'm guessing that you'd want to change this line self.f1 = nn.Linear(128 * 4 * 4, 1000) to this self.f1 = nn.Linear(128 * 1 * 1, 1000)

Pytorch couldn't build multi scaled kernel nested model

I'm trying to create a modified MNIST model which takes input 1x28x28 MNIST tensor images, and it kind of branches into different models with different sized kernels, and accumulates at the end, so as to give a multi-scale-kerneled response in the spatial domain of the images. I'm worried about the model, since, I'm unable to construct it.
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as Data
from torchvision import datasets, transforms
import torch.nn.functional as F
import timeit
import unittest
torch.manual_seed(0)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(0)
# check availability of GPU and set the device accordingly
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# define a transforms for preparing the dataset
transform = transforms.Compose([
transforms.ToTensor(), # convert the image to a pytorch tensor
transforms.Normalize((0.1307,), (0.3081,)) # normalise the images with mean and std of the dataset
])
# Load the MNIST training, test datasets using `torchvision.datasets.MNIST` using the transform defined above
train_dataset = datasets.MNIST('./data',train=True,transform=transform,download=True)
test_dataset = datasets.MNIST('./data',train=False,transform=transform,download=True)
# create dataloaders for training and test datasets
# use a batch size of 32 and set shuffle=True for the training set
train_dataloader = Data.DataLoader(dataset=train_dataset, batch_size=32, shuffle=True)
test_dataloader = Data.DataLoader(dataset=test_dataset, batch_size=32, shuffle=True)
# My Net
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# define a conv layer with output channels as 16, kernel size of 3 and stride of 1
self.conv11 = nn.Conv2d(1, 16, 3, 1) # Input = 1x28x28 Output = 16x26x26
self.conv12 = nn.Conv2d(1, 16, 5, 1) # Input = 1x28x28 Output = 16x24x24
self.conv13 = nn.Conv2d(1, 16, 7, 1) # Input = 1x28x28 Output = 16x22x22
# define a conv layer with output channels as 32, kernel size of 3 and stride of 1
self.conv21 = nn.Conv2d(16, 32, 3, 1) # Input = 16x26x26 Output = 32x24x24
self.conv22 = nn.Conv2d(16, 32, 5, 1) # Input = 16x24x24 Output = 32x20x20
self.conv23 = nn.Conv2d(16, 32, 7, 1) # Input = 16x22x22 Output = 32x16x16
# define a conv layer with output channels as 64, kernel size of 3 and stride of 1
self.conv31 = nn.Conv2d(32, 64, 3, 1) # Input = 32x24x24 Output = 64x22x22
self.conv32 = nn.Conv2d(32, 64, 5, 1) # Input = 32x20x20 Output = 64x16x16
self.conv33 = nn.Conv2d(32, 64, 7, 1) # Input = 32x16x16 Output = 64x10x10
# define a max pooling layer with kernel size 2
self.maxpool = nn.MaxPool2d(2), # Output = 64x11x11
# define dropout layer with a probability of 0.25
self.dropout1 = nn.Dropout(0.25)
# define dropout layer with a probability of 0.5
self.dropout2 = nn.Dropout(0.5)
# define a linear(dense) layer with 128 output features
self.fc11 = nn.Linear(64*11*11, 128)
self.fc12 = nn.Linear(64*8*8, 128) # after maxpooling 2x2
self.fc13 = nn.Linear(64*5*5, 128)
# define a linear(dense) layer with output features corresponding to the number of classes in the dataset
self.fc21 = nn.Linear(128, 10)
self.fc22 = nn.Linear(128, 10)
self.fc23 = nn.Linear(128, 10)
self.fc33 = nn.Linear(30,10)
def forward(self, x1):
# Use the layers defined above in a sequential way (folow the same as the layer definitions above) and
# write the forward pass, after each of conv1, conv2, conv3 and fc1 use a relu activation.
x = F.relu(self.conv11(x1))
x = F.relu(self.conv21(x))
x = F.relu(self.maxpool(self.conv31(x)))
#x = torch.flatten(x, 1)
x = x.view(-1,64*11*11)
x = self.dropout1(x)
x = F.relu(self.fc11(x))
x = self.dropout2(x)
x = self.fc21(x)
y = F.relu(self.conv12(x1))
y = F.relu(self.conv22(y))
y = F.relu(self.maxpool(self.conv32(y)))
#x = torch.flatten(x, 1)
y = y.view(-1,64*8*8)
y = self.dropout1(y)
y = F.relu(self.fc12(y))
y = self.dropout2(y)
y = self.fc22(y)
z = F.relu(self.conv13(x1))
z = F.relu(self.conv23(z))
z = F.relu(self.maxpool(self.conv33(z)))
#x = torch.flatten(x, 1)
z = z.view(-1,64*5*5)
z = self.dropout1(z)
z = F.relu(self.fc13(z))
z = self.dropout2(z)
z = self.fc23(z)
out = self.fc33(torch.cat((x, y, z), 0))
output = F.log_softmax(out, dim=1)
return output
import unittest
class TestImplementations(unittest.TestCase):
# Dataloading tests
def test_dataset(self):
self.dataset_classes = ['0 - zero',
'1 - one',
'2 - two',
'3 - three',
'4 - four',
'5 - five',
'6 - six',
'7 - seven',
'8 - eight',
'9 - nine']
self.assertTrue(train_dataset.classes == self.dataset_classes)
self.assertTrue(train_dataset.train == True)
def test_dataloader(self):
self.assertTrue(train_dataloader.batch_size == 32)
self.assertTrue(test_dataloader.batch_size == 32)
def test_total_parameters(self):
model = Net().to(device)
#self.assertTrue(sum(p.numel() for p in model.parameters()) == 1015946)
suite = unittest.TestLoader().loadTestsFromModule(TestImplementations())
unittest.TextTestRunner().run(suite)
def train(model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
# send the image, target to the device
data, target = data.to(device), target.to(device)
# flush out the gradients stored in optimizer
optimizer.zero_grad()
# pass the image to the model and assign the output to variable named output
output = model(data)
# calculate the loss (use nll_loss in pytorch)
loss = F.nll_loss(output, target)
# do a backward pass
loss.backward()
# update the weights
optimizer.step()
if batch_idx % 100 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
# send the image, target to the device
data, target = data.to(device), target.to(device)
# pass the image to the model and assign the output to variable named output
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
model = Net().to(device)
## Define Adam Optimiser with a learning rate of 0.01
optimizer = torch.optim.Adam(model.parameters(),lr=0.01)
start = timeit.default_timer()
for epoch in range(1, 11):
train(model, device, train_dataloader, optimizer, epoch)
test(model, device, test_dataloader)
stop = timeit.default_timer()
print('Total time taken: {} seconds'.format(int(stop - start)) )
Here is my full code. I couldn't understand what could possibly go wrong...
It is giving
<ipython-input-72-194680537dcc> in forward(self, x1)
46 x = F.relu(self.conv11(x1))
47 x = F.relu(self.conv21(x))
---> 48 x = F.relu(self.maxpool(self.conv31(x)))
49 #x = torch.flatten(x, 1)
50 x = x.view(-1,64*11*11)
TypeError: 'tuple' object is not callable
Error.
P.S.: Pytorch Noob here.

You have mistakenly placed a comma at the end of the line where you define self.maxpool : self.maxpool = nn.MaxPool2d(2), # Output = 64x11x11 see?
This comma makes self.maxpool a tuple instead of a torch.nn.modules.pooling.MaxPool2d. Drop the comma at the end and this error is fixed.

I see you haven't given the stride argument in you definition of self.maxpool = nn.MaxPool2d(2). Choose one: e.g. self.maxpool = nn.MaxPool2d(2, stride = 2).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

PyTorch: Fully Connected Layer has no Parameters - pytorch

Related

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x1792 and 2048x1) in Google Colab in Windows (Python code)

Pytorch lightning: see input/ouptut size in model summary when using nn.ModuleList

KL Divergence goes NaN on Bayesian Convolutional Neural Network

Why am I getting the error ValueError: Expected input batch_size (4) to match target batch_size (64)?

Pytorch couldn't build multi scaled kernel nested model

Categories

Resources