Why are the parameters of this PyTorch AutoEncoder hardcoded this way? - pytorch

Hi I am trying to understand how the following PyTorch AutoEncoder code works. The code below uses the MNIST dataset which is 28X28. My question is how the nn.Linear(128,3) parameters where chosen?
I have a dataset which is 512X512 and I would like to modify the code for this AutoEncoder to support.
class LitAutoEncoder(pl.LightningModule):
def __init__(self):
super().__init__()
self.encoder = nn.Sequential(nn.Linear(28 * 28, 128), nn.ReLU(), nn.Linear(128, 3))
self.decoder = nn.Sequential(nn.Linear(3, 128), nn.ReLU(), nn.Linear(128, 28 * 28))
def forward(self, x):
# in lightning, forward defines the prediction/inference actions
embedding = self.encoder(x)
return embedding
def training_step(self, batch, batch_idx):
# training_step defined the train loop. It is independent of forward
x, y = batch
x = x.view(x.size(0), -1)
z = self.encoder(x)
x_hat = self.decoder(z)
loss = F.mse_loss(x_hat, x)
return loss
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
return optimizer

I am assuming input image data are in this shape: x.shape == [bs, 1, h, w], where bs is batch size. Then, x is first viewed as [bs, h*w], i.e. [bs, 28*28]. This means all pixels in an image are flattened into a 1D vector.
Then in the encoder:
nn.Linear(28*28, 128) takes flattened input of size [bs, 28*28] and outputs intermediate result of size [bs, 128]
nn.Linear(128, 3): [bs, 128] -> [bs, 3]
Then in the decoder:
nn.Linear(3, 128): [bs, 3] -> [bs, 128]
nn.Linear(128, 28*28): [bs, 128] -> [bs, 28*28]
The final output is then matched against the input.
If you want to use the exact architecture for your 512x512 images, simply change every occurrence of 28*28 in the code to 512*512. However, this is a quite infeasible choice, for these reasons:
For MNIST images, nn.Linear(28*28, 128) contains 28x28x128+128=100480 parameters, while for your images nn.Linear(512*512, 128) contains 512x512x128+128=33554560 parameters. The size is too large, and it may lead to overfitting
The intermediate data [bs, 3] uses only 3 floats to encode a 512x512 image. I don't think you can recover anything with such compression
I'd suggest looking up convolutional architectures for you purpose

Related

How to specify the batch dimension in a conv2D layer with pyTorch

I have a dataset of 600x600 grayscale images, grouped in batches of 50 images by a dataloader.
My network has a convolution layer with 16 filters, followed by Maxpooling with 6x6 kernels, and then a Dense layer. The output of the conv2D should be out_channels*width*height/maxpool_kernel_W/maxpool_kernel_H = 16*600*600/6/6 = 160000, multiplied by the batch size, 50.
However when I try to do a forward pass I get the following error: RuntimeError: mat1 and mat2 shapes cannot be multiplied (80000x100 and 160000x1000). I verified that the data is formatted correctly as [batch,n_channels,width,height] (so [50,1,600,600] in my case).
Logically the output should be a 50x160000 matrix, but apparently it is formatted as a 80000x100 matrix. It seems like torch is multiplying the matrices along the wrong dimensions. If anyone understands why, please help me understand too.
# get data (using a fake dataset generator)
dataset = FakeData(size=500, image_size= (1, 600, 600), transform=ToTensor())
training_data, test_data = random_split(dataset,[400,100])
train_dataloader = DataLoader(training_data, batch_size=50, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=50, shuffle=True)
net = nn.Sequential(
nn.Conv2d(
in_channels=1,
out_channels=16,
kernel_size=5,
padding=2,
),
nn.ReLU(),
nn.MaxPool2d(kernel_size=6),
nn.Linear(160000, 1000),
nn.ReLU(),
)
optimizer = optim.Adam(net.parameters(), lr=1e-3,)
epochs = 10
for i in range(epochs):
for (x, _) in train_dataloader:
optimizer.zero_grad()
# make sure the data is in the right shape
print(x.shape) # returns torch.Size([50, 1, 600, 600])
# error happens here, at the first forward pass
output = net(x)
criterion = nn.MSELoss()
loss = criterion(output, x)
loss.backward()
optimizer.step()
If you inspect your model's inference layer by layer you would have noticed that the nn.MaxPool2d returns a 4D tensor shaped (50, 16, 100, 100). There are different ways to reduce spatial dimensionality (flattening, average-pooling, max-pooling). For instance, if you want to flatten the spatial dimensions, this will result in a tensor of shape (50, 16*100*100), ie. (50, 160_000) as you expected to have. This being said you are required to use a nn.Flatten layer.
net = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, padding=2),
nn.ReLU(),
nn.MaxPool2d(kernel_size=6),
nn.Flatten(),
nn.Linear(160000, 1000),
nn.ReLU())

Expected input batch_size (18) to match target batch_size (6)

Is RNN for image classification available only for gray image?
The following program works for gray image classification.
If RGB images are used, I have this error:
Expected input batch_size (18) to match target batch_size (6)
at this line loss = criterion(outputs, labels).
My data loading for train, valid and test are as follows.
input_size = 300
inputH = 300
inputW = 300
#Data transform (normalization & data augmentation)
stats = ((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
train_resize_tfms = tt.Compose([tt.Resize((inputH, inputW), interpolation=2),
tt.ToTensor(),
tt.Normalize(*stats)])
train_tfms = tt.Compose([tt.Resize((inputH, inputW), interpolation=2),
tt.RandomHorizontalFlip(),
tt.ToTensor(),
tt.Normalize(*stats)])
valid_tfms = tt.Compose([tt.Resize((inputH, inputW), interpolation=2),
tt.ToTensor(),
tt.Normalize(*stats)])
test_tfms = tt.Compose([tt.Resize((inputH, inputW), interpolation=2),
tt.ToTensor(),
tt.Normalize(*stats)])
#Create dataset
train_ds = ImageFolder('./data/train', train_tfms)
valid_ds = ImageFolder('./data/valid', valid_tfms)
test_ds = ImageFolder('./data/test', test_tfms)
from torch.utils.data.dataloader import DataLoader
batch_size = 6
#Training data loader
train_dl = DataLoader(train_ds, batch_size, shuffle = True, num_workers = 8, pin_memory=True)
#Validation data loader
valid_dl = DataLoader(valid_ds, batch_size, shuffle = True, num_workers = 8, pin_memory=True)
#Test data loader
test_dl = DataLoader(test_ds, 1, shuffle = False, num_workers = 1, pin_memory=True)
My model is as follows.
num_steps = 300
hidden_size = 256 #size of hidden layers
num_classes = 5
num_epochs = 20
learning_rate = 0.001
# Fully connected neural network with one hidden layer
num_layers = 2 # 2 RNN layers are stacked
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(RNN, self).__init__()
self.num_layers = num_layers
self.hidden_size = hidden_size
self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True, dropout=0.2)#batch must have first dimension
#our inpyt needs to have shape
#x -> (batch_size, seq, input_size)
self.fc = nn.Linear(hidden_size, num_classes)#this fc is after RNN. So needs the last hidden size of RNN
def forward(self, x):
#according to ducumentation of RNN in pytorch
#rnn needs input, h_0 for inputs at RNN (h_0 is initial hidden state)
#the following one is initial hidden layer
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)#first one is number of layers and second one is batch size
#output has two outputs. The first tensor contains the output features of the hidden last layer for all time steps
#the second one is hidden state f
out, _ = self.rnn(x, h0)
#output has batch_size, num_steps, hidden size
#we need to decode hidden state only the last time step
#out (N, 30, 128)
#Since we need only the last time step
#Out (N, 128)
out = out[:, -1, :] #-1 for last time step, take all for N and 128
out = self.fc(out)
return out
stacked_rnn_model = RNN(input_size, hidden_size, num_layers, num_classes).to(device)
# Loss and optimizer
criterion = nn.CrossEntropyLoss()#cross entropy has softmax at output
#optimizer = torch.optim.Adam(stacked_rnn_model.parameters(), lr=learning_rate) #optimizer used gradient optimization using Adam
optimizer = torch.optim.SGD(stacked_rnn_model.parameters(), lr=learning_rate)
# Train the model
n_total_steps = len(train_dl)
for epoch in range(num_epochs):
t_losses=[]
for i, (images, labels) in enumerate(train_dl):
# origin shape: [6, 3, 300, 300]
# resized: [6, 300, 300]
images = images.reshape(-1, num_steps, input_size).to(device)
print('images shape')
print(images.shape)
labels = labels.to(device)
# Forward pass
outputs = stacked_rnn_model(images)
print('outputs shape')
print(outputs.shape)
loss = criterion(outputs, labels)
t_losses.append(loss)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
Printing images and outputs shapes are
images shape
torch.Size([18, 300, 300])
outputs shape
torch.Size([18, 5])
Where is the mistake?
Tl;dr: You are flattening the first two axes, namely batch and channels.
I am not sure you are taking the right approach but I will write about that layer.
In any case, let's look at the issue you are facing. You have a data loader that produces (6, 3, 300, 300), i.e. batches of 6 three-channel 300x300 images. By the look of it you are looking to reshape each batch element (3, 300, 300) into (step_size=300, -1).
However instead of that you are affecting the first axis - which you shouldn't - with images.reshape(-1, num_steps, input_size). This will have the desired effect when working with a single-channel images since dim=1 wouldn't be the "channel axis". In your case your have 3 channels, therefore, the resulting shape is: (6*3*300*300//300//300, 300, 300) which is (18, 300, 300) since num_steps=300 and input_size=300. As a result you are left with 18 batch elements instead of 6.
Instead what you want is to reshape with (batch_size, num_steps, -1). Leaving the last axis (a.k.a. seq_length) of variable size. This will result in a shape (6, 300, 900).
Here is a corrected and reduced snippet:
batch_size = 6
channels = 3
inputH, inputW = 300, 300
train_ds = TensorDataset(torch.rand(100, 3, inputH, inputW), torch.rand(100, 5))
train_dl = DataLoader(train_ds, batch_size)
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(RNN, self).__init__()
# (batch_size, seq, input_size)
self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
# (batch_size, hidden_size)
self.fc = nn.Linear(hidden_size, num_classes)
# (batch_size, num_classes)
def forward(self, x):
out, _ = self.rnn(x)
out = out[:, -1, :]
out = self.fc(out)
return out
num_steps = 300
input_size = inputH*inputW*channels//num_steps
hidden_size = 256
num_classes = 5
num_layers = 2
rnn = RNN(input_size, hidden_size, num_layers, num_classes)
for x, y in train_dl:
print(x.shape, y.shape)
images = images.reshape(batch_size, num_steps, -1)
print(images.shape)
outputs = rnn(images)
print(outputs.shape)
break
As I said in the beginning I am a bit wary about this approach because you are essentially feeding your RNN a RGB 300x300 image in the form of a sequence of 300 flattened vectors... I can't say if that makes sense and terms of training and if the model will be able to learn from that. I could be wrong!

Activation gradient penalty

Here's a simple neural network, where I’m trying to penalize the norm of activation gradients:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 32, kernel_size=5)
self.conv2 = nn.Conv2d(32, 64, kernel_size=5)
self.pool = nn.MaxPool2d(2, 2)
self.relu = nn.ReLU()
self.linear = nn.Linear(64 * 5 * 5, 10)
def forward(self, input):
conv1 = self.conv1(input)
pool1 = self.pool(conv1)
self.relu1 = self.relu(pool1)
self.relu1.retain_grad()
conv2 = self.conv2(relu1)
pool2 = self.pool(conv2)
relu2 = self.relu(pool2)
self.relu2 = relu2.view(relu2.size(0), -1)
self.relu2.retain_grad()
return self.linear(relu2)
model = Net()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
for i in range(1000):
output = model(input)
loss = nn.CrossEntropyLoss()(output, label)
optimizer.zero_grad()
loss.backward(retain_graph=True)
grads = torch.autograd.grad(loss, [model.relu1, model.relu2], create_graph=True)
grad_norm = 0
for grad in grads:
grad_norm += grad.pow(2).sum()
grad_norm.backward()
optimizer.step()
However, it does not produce the desired regularization effect. If I do the same thing for weights (instead of activations), it works well. Am I doing this right (in terms of pytorch machinery)? Specifically, what happens in grad_norm.backward() call? I just want to make sure the weight gradients are updated, and not activation gradients. Currently, when I print out gradients for weights and activations immediately before and after that line, both change - so I’m not sure what’s going on.
I think your code ends up computing some of the gradients twice in each step. I also suspect it actually never zeroes out the activation gradients, so they accumulate across steps.
In general:
x.backward() computes gradient of x wrt. computation graph leaves (e.g. weight tensors and other variables), as well as wrt. nodes explicitly marked with retain_grad(). It accumulates the computed gradient in tensors' .grad attributes.
autograd.grad(x, [y, z]) returns gradient of x wrt. y and z regardless of whether they would normally retain grad or not. By default, it will also accumulate gradient in all leaves' .grad attributes. You can prevent this by passing only_inputs=True.
I prefer to use backward() only for the optimization step, and autograd.grad() whenever my goal is to obtain "reified" gradients as intermediate values for another computation. This way, I can be sure that no unwanted gradients remain lying around in tensors' .grad attributes after I'm done with them.
import torch
from torch import nn
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 32, kernel_size=5)
self.conv2 = nn.Conv2d(32, 64, kernel_size=5)
self.pool = nn.MaxPool2d(2, 2)
self.relu = nn.ReLU()
self.linear = nn.Linear(64 * 5 * 5, 10)
def forward(self, input):
conv1 = self.conv1(input)
pool1 = self.pool(conv1)
self.relu1 = self.relu(pool1)
conv2 = self.conv2(self.relu1)
pool2 = self.pool(conv2)
self.relu2 = self.relu(pool2)
relu2 = self.relu2.view(self.relu2.size(0), -1)
return self.linear(relu2)
model = Net()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
grad_penalty_weight = 10.
for i in range(1000000):
# Random input and labels; we're not really learning anything
input = torch.rand(1, 3, 32, 32)
label = torch.randint(0, 10, (1,))
output = model(input)
loss = nn.CrossEntropyLoss()(output, label)
# This is where the activation gradients are computed
# only_inputs is optional here, since we're going to call optimizer.zero_grad() later
# But it makes clear that we're *only* interested in the activation gradients at this point
grads = torch.autograd.grad(loss, [model.relu1, model.relu2], create_graph=True, only_inputs=True)
grad_norm = 0
for grad in grads:
grad_norm += grad.pow(2).sum()
optimizer.zero_grad()
loss = loss + grad_norm * grad_penalty_weight
loss.backward()
optimizer.step()
This code appears to work, in that the activation gradients do get smaller.
I cannot comment on the viability of this technique as a regularization method.

Prepare Decoder of a Sequence to Sequence Network in PyTorch

I was working with Sequence to Sequence models in Pytorch. Sequence to Sequence Models comprises of an Encoder and a Decoder.
The Encoder convert a (batch_size X input_features X num_of_one_hot_encoded_classes) -> (batch_size X input_features X hidden_size)
The Decoder will take this input sequence and convert it into (batch_size X output_features X num_of_one_hot_encoded_classes)
An example would be like-
So on the above example, I would need to convert the 22 input features to 10 output features. In Keras it could be done with a RepeatVector(10).
An Example -
model.add(LSTM(256, input_shape=(22, 98)))
model.add(RepeatVector(10))
model.add(Dropout(0.3))
model.add(LSTM(256, return_sequences=True))
Although, I'm not sure if it's the proper way to convert the input sequences into the output ones.
So, my question is -
What's the standard way to convert the input sequences to
output ones. eg. converting from (batch_size, 22, 98) -> (batch_size,
10, 98)? Or how should I prepare the Decoder?
Encoder Code snippet (Written in Pytorch) -
class EncoderRNN(nn.Module):
def __init__(self, input_size, hidden_size):
super(EncoderRNN, self).__init__()
self.hidden_size = hidden_size
self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size,
num_layers=1, batch_first=True)
def forward(self, input):
output, hidden = self.lstm(input)
return output, hidden
Well, you have to options, first one is to repeat the encoder's last state for 10 times and give it as input to the decoder, like this:
import torch
input = torch.randn(64, 22, 98)
encoder = torch.nn.LSTM(98, 256, batch_first=True)
encoded, _ = encoder(input)
decoder_input = encoded[:, -1:].repeat(1, 10, 1)
decoder = torch.nn.LSTM(256, 98, batch_first=True)
decoded, _ = decoder(decoder_input)
print(decoded.shape) #torch.Size([64, 10, 98])
Another option is to use an attention mechanism, like this:
#assuming we have obtained the encoded sequence and declared the decoder as before
attention_calculator = torch.nn.Conv1d(256+98, 1, kernel_size=1)
hidden = (torch.zeros(1, 64, 98), torch.zeros(1, 64, 98))
outputs = []
for i in range(10):
attention_input = torch.cat([hidden[0][0][:, None, :].expand(-1, 22, -1), encoded], dim=2).permute(0, 2, 1)
attention_value = torch.nn.functional.softmax(attention_calculator(attention_input).squeeze(), dim=1)
decoder_input = (attention_value[:, :, None] * encoded).sum(dim=1, keepdim=True)
output, hidden = decoder(decoder_input, hidden)
outputs.append(output)
outputs = torch.cat(outputs, dim=1)

How to input a matrix to CNN in pytorch

I'm very new to pytorch and I want to figure out how to input a matrix rather than image into CNN.
I have try it in the following way, but some errors occur.
I define my dataset as following:
class FrameDataSet(tud.Dataset):
def __init__(self, data):
targets = data['class'].values.tolist()
features = data.drop('class', axis=1).astype(np.int64).values
self.datalist = features.reshape((-1, feature_num, frame_size))
self.labellist = targets
def __getitem__(self, index):
return torch.Tensor(self.datalist[index].astype(float)), self.labellist[index]
def __len__(self):
return self.datalist.shape[0]
And my CNN is:
self.conv = nn.Sequential(
nn.Conv2d(1, 12, 3),
nn.ReLU(True),
nn.MaxPool2d(3, 3))
self.fc1 = nn.Linear(80, 100)
self.fc2 = nn.Linear(100, 30)
self.fc3 = nn.Linear(30, 5)
But when the data was inputted to CNN, the error brings:
File "/home/sparks/anaconda2/lib/python2.7/site-packages/torch/nn/functional.py", line 48, in conv2d
raise ValueError("Expected 4D tensor as input, got {}D tensor instead.".format(input.dim()))
Expected 4D tensor as input, got 3D tensor instead.
Your input probably missing one dimension. It should be:
(batch_size, channels, width, height)
If you only have one element in the batch, the tensor have to be in your case
e.g (1, 1, 28, 28)
because your first conv2d-layer expected a 1-channel input.

Resources