Questions about programming a cnn with PyTorch - pytorch

I'm pretty new at programming cnn so I'm a little bit lost. I'm trying to do this part of the code, where they ask me to implement a fully-connected network to classify the digits. It should contain 1 hidden layer with 20 units. I should use ReLU activation function on the hidden layer.
class Network(nn.Module):
def __init__(self):
super(Network, self).__init__()
self.fc1 = ...
self.fc2 = nn.Sequential(
nn.Linear(500,10),
nn.Softmax(dim = 1)
)
def forward(self, x):
x = x.view(x.size(0),-1)
x = self.fc1(x)
x = self.fc2(x)
return x
The dots are the part to fill, I think about this line:
self.fc1 = nn.Linear(20, 500)
But I don't know if it's correct. Could someone help me please? And I don't understand at all what the function Softmax do... so if someone knows it please.
Thank you so much!!
Pd. This is the code to load the data:
batch_size = 64
trainset = datasets.MNIST('./data', train=True, download=True, transform=transforms.ToTensor())
train_loader = DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=1)
testset = datasets.MNIST('./data', train=False, download=True, transform=transforms.ToTensor())
test_loader = DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=1)

From the code given for the model, it can be seen that the hidden layer has 500 units. So I am assuming you meant 20 units for input. With this assumption, the code must be:
self.fc1 = nn.Sequential(
nn.Linear(20, 500),
nn.ReLU()
)
Coming to the next part of your question, given that you are working with MNIST dataset and you have the softmax function, I am assuming you are trying to predict the number present in the images.
Your neural network performs various multiplication and addition operations in each layer and finally, you end up with 10 numbers in the output layer. Now, you have to make sense of these 10 numbers to decide which of the 10 digits is given in the image.
One way to do this would be to select the unit which has the maximum value. For example if the 10th unit has the maximum value among all units, then we conclude that the digit is '9'. If the 2nd unit has the maximum value, then we conclude that the digit is '1'.
This is fine but a better way would be to convert the values of each of the units to probability that the corresponding digit is contained in the image and then we choose the digit having highest probability. This has certain mathematical advantages which helps us in defining a better loss function.
Softmax is what helps us to convert the values to probabilities. On applying softmax, all the values lie in the range (0, 1) and they sum up to 1.
If you are interested in deeplearning and the math behind it, I would suggest you to checkout Andrew NG's course on deeplearning.

You did not mention the shape of your data so I'll be assuming the expected shape returned by datasets.MNIST.
Data shape: torch.Size([64, 1, 28, 28])
class Network(nn.Module):
def __init__(self):
super(Network, self).__init__()
self.fc1 = nn.Sequential(
nn.Linear(1*28*28, 20),
nn.ReLU())
self.fc2 = nn.Sequential(
nn.Linear(500,10),
nn.Softmax(dim = 1))
def forward(self, x):
x = x.view(x.size(0), -1)
x = self.fc1(x)
x = self.fc2(x)
return x
The first argument of nn.Linear is the size of input feature while the second is the number of units.
For self.fc1, the size of the input feature is the multiplication of your data shape except the batch size, which is 1 * 28 * 28. And as per your post the second argument should be 20 (20 units).
The shape of the output from self.fc1 (which is also the input to self.fc2) will then be (batch size, 20).
For self.fc2, the size of the input feature will be 20 while the number of units (which is also the number of digits) will be 10.

Related

How to do a weighted pooling in Mxnet?

I want to do a 2d convolutional operation that uses same 1x2x4 weight on every channel.
(Note: the input height & width are bigger than our kernel, so I can't just use a dot product.)
How can I do this is mxnet?
I tried to use the same instance of a signle 2d conv layer by concatenating it on every channel, but it is incredibly slow.
def Concat(*args, axis=1, **kwargs):
net = nn.HybridConcatenate(axis=axis,**kwargs)
net.add(*args)
return net
def Seq(*args):
net = nn.HybridSequential()
net.add(*args)
return net
class Trim_D1(nn.HybridBlock):
def __init__(self, from_, to, **kwargs):
super(Trim_D1, self).__init__(**kwargs)
self.from_ = from_
self.to = to
def forward(self, x):
return x[:,self.from_:self.to]
PooPool = nn.Conv2D(kernel_size=(2,4), strides=(2, 4), channels=1, activation=None, use_bias=False, weight_initializer=mx.init.Constant(1/8))
conc = ()
for i in range(40):
conc += Seq(
Trim_D1(i,i+1),
PooPool
),
WeightedPool= Concat(*conc)
Ideally I would also want my kernel weights to sum up to 1 in order to resemble the weighted average pooling.
Edit: I think I know how to do this. I'm going to edit Conv2D and _Conv source codes so that instead of creating weights of CxHxW dimension it creates a weight of 1xHxW dimension and uses a broadcasting during the convolutional operation. In order for weights to sum up to 1, additionally a softmax operation has to be applied.
Ok, apparently the weights are of in_channels x out_channels x H x W dimensions and broadcasting is not allowed during the convolutional operation. We could fix out_channels to 1 by using the num_groups same as the output channels, as for input channels, we can simply broadcast the same weight n number of times.
In _Conv.__init__ during initialization I discarded the first two dimensions so our kernel is only H x W now:
self.weight = Parameter('weight', shape=wshapes[1][2:],
init=weight_initializer,
allow_deferred_init=True)
In _Conv.hybrid_forward I am flattening our weight to 1D in order to perform softmax and then restore to the original 2D shape. Then I expand first two dimensions and repeat the first dimension as mentioned above:
orig_shape = weight.shape
act = getattr(F, self._op_name)(x, mx.nd.softmax(weight.reshape(-1)).reshape(orig_shape)[None,None,:].repeat(self._kwargs['num_group'],axis=0), name='fwd', **self._kwargs)

Question about input dimension for conv2D-LSTM implement

I am a PyTorch beginner and would like to get help applying the conv2d-LSTM model.
I have a 2D image (1 channel x Time x Frequency) that contains time and frequency information.
I’d like to extract features automatically using conv2D and then LSTM model because 2D image contains time information
According to PyTorch documents, the output shape of conv2D is (Batch size, Channel out, Height out, Width out) and the input shape of LSTM is (Batch size, sequence length, input size). From that, I thought before input features of the LSTM network there need to reshape the output features of conv2D.
I expected the cnn-lstm model to perform well because it could learn the characteristics and time information of the image, but it did not get the expected performance.
My question is when I insert data into the LSTM model, is there any idea that LSTM learns the data by each row without flattening? Should I always flatten the 2D output?
My networks code and input/output shape are as follows. (I maintained the width size in the conv layer to preserve time information.)
Thanks a lot
class CNN_LSTM(nn.Module):
def __init__(self, paramArr1, paramArr2):
super(CNN_LSTM, self).__init__()
self.input_dim = paramArr2[0]
self.hidden_dim = paramArr2[1]
self.n_layers = paramArr2[2]
self.batch_size = paramArr2[3]
self.conv = nn.Sequential(
nn.Conv2d(1, out_channels=paramArr1[0],
kernel_size=(paramArr1[1],1),
stride=(paramArr1[2],1)),
nn.BatchNorm2d(paramArr1[0]),
nn.ReLU(),
nn.MaxPool2d(kernel_size = (paramArr1[3],1),stride=(paramArr1[4],1))
)
self.lstm = nn.LSTM(input_size = paramArr2[0],
hidden_size=paramArr2[1],
num_layers=paramArr2[2],
batch_first=True)
self.linear = nn.Linear(in_features=paramArr2[1], out_features=1)
def reset_hidden_state(self):
self.hidden = (
torch.zeros(self.n_layers, self.batch_size, self.hidden_dim).to(device),
torch.zeros(self.n_layers, self.batch_size, self.hidden_dim).to(device)
)
def forward(self, x):
x = self.conv(x)
x = x.view(x.size(0), x.size(1),-1)
x = x.permute(0,2,1)
out, (hn, cn) = self.lstm(x, self.hidden)
out = out.squeeze()[-1, :]
out = self.linear(out)
return out
model input/output shape

Understanding the architecture of an LSTM for sequence classification

I have this model in pytorch that I have been using for sequence classification.
class RoBERT_Model(nn.Module):
def __init__(self, hidden_size = 100):
self.hidden_size = hidden_size
super(RoBERT_Model, self).__init__()
self.lstm = nn.LSTM(768, hidden_size, num_layers=1, bidirectional=False)
self.out = nn.Linear(hidden_size, 2)
def forward(self, grouped_pooled_outs):
# chunks_emb = pooled_out.split_with_sizes(lengt) # splits the input tensor into a list of tensors where the length of each sublist is determined by length
seq_lengths = torch.LongTensor([x for x in map(len, grouped_pooled_outs)]) # gets the length of each sublist in chunks_emb and returns it as an array
batch_emb_pad = nn.utils.rnn.pad_sequence(grouped_pooled_outs, padding_value=-91, batch_first=True) # pads each sublist in chunks_emb to the largest sublist with value -91
batch_emb = batch_emb_pad.transpose(0, 1) # (B,L,D) -> (L,B,D)
lstm_input = nn.utils.rnn.pack_padded_sequence(batch_emb, seq_lengths, batch_first=False, enforce_sorted=False) # seq_lengths.cpu().numpy()
packed_output, (h_t, h_c) = self.lstm(lstm_input, ) # (h_t, h_c))
# output, _ = nn.utils.rnn.pad_packed_sequence(packed_output, padding_value=-91)
h_t = h_t.view(-1, self.hidden_size) # (-1, 100)
return self.out(h_t) # logits
The issue that I am having is that I am not entirely convinced of what data is being passed to the final classification layer. I believe what is being done is that only the final LSTM cell in the last layer is being used for classification. That is there are hidden_size features that are passed to the feedforward layer.
I have depicted what I believe is going on in this figure here:
Is this understanding correct? Am I missing anything?
Thanks.
Your code is a basic LSTM for classification, working with a single rnn layer.
In your picture you have multiple LSTM layers, while, in reality, there is only one, H_n^0 in the picture.
Your input to LSTM is of shape (B, L, D) as correctly pointed out in the comment.
packed_output and h_c is not used at all, hence you can change this line to: _, (h_t, _) = self.lstm(lstm_input) in order no to clutter the picture further
h_t is output of last step for each batch element, in general (B, D * L, hidden_size). As this neural network is not bidirectional D=1, as you have a single layer L=1 as well, hence the output is of shape (B, 1, hidden_size).
This output is reshaped into nn.Linear compatible (this line: h_t = h_t.view(-1, self.hidden_size)) and will give you output of shape (B, hidden_size)
This input is fed to a single nn.Linear layer.
In general, the output of the last time step from RNN is used for each element in the batch, in your picture H_n^0 and simply fed to the classifier.
By the way, having self.out = nn.Linear(hidden_size, 2) in classification is probably counter-productive; most likely your are performing binary classification and self.out = nn.Linear(hidden_size, 1) with torch.nn.BCEWithLogitsLoss might be used. Single logit contains information whether the label should be 0 or 1; everything smaller than 0 is more likely to be 0 according to nn, everything above 0 is considered as a 1 label.

how to build a multidimensional autoencoder with pytorch

I followed this great answer for sequence autoencoder,
LSTM autoencoder always returns the average of the input sequence.
but I met some problem when I try to change the code:
question one:
Your explanation is so professional, but the problem is a little bit different from mine, I attached some code I changed from your example. My input features are 2 dimensional, and my output is same with the input.
for example:
input_x = torch.Tensor([[0.0,0.0], [0.1,0.1], [0.2,0.2], [0.3,0.3], [0.4,0.4]])
output_y = torch.Tensor([[0.0,0.0], [0.1,0.1], [0.2,0.2], [0.3,0.3], [0.4,0.4]])
the input_x and output_y are same, 5-timesteps, 2-dimensional feature.
import torch
import torch.nn as nn
import torch.optim as optim
class LSTM(nn.Module):
def __init__(self, input_dim, latent_dim, num_layers):
super(LSTM, self).__init__()
self.input_dim = input_dim
self.latent_dim = latent_dim
self.num_layers = num_layers
self.encoder = nn.LSTM(self.input_dim, self.latent_dim, self.num_layers)
# I changed here, to 40 dimesion, I think there is some problem
# self.decoder = nn.LSTM(self.latent_dim, self.input_dim, self.num_layers)
self.decoder = nn.LSTM(40, self.input_dim, self.num_layers)
def forward(self, input):
# Encode
_, (last_hidden, _) = self.encoder(input)
# It is way more general that way
encoded = last_hidden.repeat(input.shape)
# Decode
y, _ = self.decoder(encoded)
return torch.squeeze(y)
model = LSTM(input_dim=2, latent_dim=20, num_layers=1)
loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters())
y = torch.Tensor([[0.0,0.0], [0.1,0.1], [0.2,0.2], [0.3,0.3], [0.4,0.4]])
x = y.view(len(y), -1, 2) # I changed here
while True:
y_pred = model(x)
optimizer.zero_grad()
loss = loss_function(y_pred, y)
loss.backward()
optimizer.step()
print(y_pred)
The above code can learn very well, can you help review the code and give some instructions.
When I input 2 examples as the input to the model, the model cannot work:
for example, change the code:
y = torch.Tensor([[0.0,0.0], [0.1,0.1], [0.2,0.2], [0.3,0.3], [0.4,0.4]])
to:
y = torch.Tensor([[[0.0,0.0],[0.5,0.5]], [[0.1,0.1], [0.6,0.6]], [[0.2,0.2],[0.7,0.7]], [[0.3,0.3],[0.8,0.8]], [[0.4,0.4],[0.9,0.9]]])
When I compute the loss function, it complain some errors? can anyone help have a look
question two:
my training samples are with different length:
for example:
x1 = [[0.0,0.0], [0.1,0.1], [0.2,0.2], [0.3,0.3], [0.4,0.4]] #with 5 timesteps
x2 = [[0.5,0.5], [0.6,0.6], [0.7,0.7]] #with only 3 timesteps
How can I input these two training sample into the model at the same time for a batch training.
Recurrent N-dimensional autoencoder
First of all, LSTMs work on 1D samples, yours are 2D as it's usually used for words encoded with a single vector.
No worries though, one can flatten this 2D sample to 1D, example for your case would be:
import torch
var = torch.randn(10, 32, 100, 100)
var.reshape((10, 32, -1)) # shape: [10, 32, 100 * 100]
Please notice it's really not general, what if you were to have 3D input? Snippet belows generalizes this notion to any dimension of your samples, provided the preceding dimensions are batch_size and seq_len:
import torch
input_size = 2
var = torch.randn(10, 32, 100, 100, 35)
var.reshape(var.shape[:-input_size] + (-1,)) # shape: [10, 32, 100 * 100 * 35]
Finally, you can employ it inside neural network as follows. Look at forward method especially and constructor arguments:
import torch
class LSTM(nn.Module):
# input_dim has to be size after flattening
# For 20x20 single input it would be 400
def __init__(
self,
input_dimensionality: int,
input_dim: int,
latent_dim: int,
num_layers: int,
):
super(LSTM, self).__init__()
self.input_dimensionality: int = input_dimensionality
self.input_dim: int = input_dim # It is 1d, remember
self.latent_dim: int = latent_dim
self.num_layers: int = num_layers
self.encoder = torch.nn.LSTM(self.input_dim, self.latent_dim, self.num_layers)
# You can have any latent dim you want, just output has to be exact same size as input
# In this case, only encoder and decoder, it has to be input_dim though
self.decoder = torch.nn.LSTM(self.latent_dim, self.input_dim, self.num_layers)
def forward(self, input):
# Save original size first:
original_shape = input.shape
# Flatten 2d (or 3d or however many you specified in constructor)
input = input.reshape(input.shape[: -self.input_dimensionality] + (-1,))
# Rest goes as in my previous answer
_, (last_hidden, _) = self.encoder(input)
encoded = last_hidden.repeat(input.shape)
y, _ = self.decoder(encoded)
# You have to reshape output to what the original was
reshaped_y = y.reshape(original_shape)
return torch.squeeze(reshaped_y)
Remember you have to reshape your output in this case. It should work for any dimensions.
Batching
When it comes to batching and different length of sequences it is a little more complicated.
You have to pad each sequence in batch before pushing it through network. Usually, values with which you pad are zeros, you may configure it inside LSTM though.
You may check this link for an example. You will have to use functions like torch.nn.pack_padded_sequence and others to make it work, you may check this answer.
Oh, since PyTorch 1.1 you don't have to sort your sequences by length in order to pack them. But when it comes to this topic, grab some tutorials, should make things clearer.
Lastly: Please, separate your questions. If you perform the autoencoding with single example, move on to batching and if you have issues there, please post a new question on StackOverflow, thanks.

How can I use LSTM in pytorch for classification?

My code is as below:
class Mymodel(nn.Module):
def __init__(self, input_size, hidden_size, output_size, num_layers, batch_size):
super(Discriminator, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.num_layers = num_layers
self.batch_size = batch_size
self.lstm = nn.LSTM(input_size, hidden_size)
self.proj = nn.Linear(hidden_size, output_size)
self.hidden = self.init_hidden()
def init_hidden(self):
return (Variable(torch.zeros(self.num_layers, self.batch_size, self.hidden_size)),
Variable(torch.zeros(self.num_layers, self.batch_size, self.hidden_size)))
def forward(self, x):
lstm_out, self.hidden = self.lstm(x, self.hidden)
output = self.proj(lstm_out)
result = F.sigmoid(output)
return result
I want to use LSTM to classify a sentence to good (1) or bad (0). Using this code, I get the result which is time_step * batch_size * 1 but not 0 or 1. How to edit the code in order to get the classification result?
Theory:
Recall that an LSTM outputs a vector for every input in the series. You are using sentences, which are a series of words (probably converted to indices and then embedded as vectors). This code from the LSTM PyTorch tutorial makes clear exactly what I mean (***emphasis mine):
lstm = nn.LSTM(3, 3) # Input dim is 3, output dim is 3
inputs = [autograd.Variable(torch.randn((1, 3)))
for _ in range(5)] # make a sequence of length 5
# initialize the hidden state.
hidden = (autograd.Variable(torch.randn(1, 1, 3)),
autograd.Variable(torch.randn((1, 1, 3))))
for i in inputs:
# Step through the sequence one element at a time.
# after each step, hidden contains the hidden state.
out, hidden = lstm(i.view(1, 1, -1), hidden)
# alternatively, we can do the entire sequence all at once.
# the first value returned by LSTM is all of the hidden states throughout
# the sequence. the second is just the most recent hidden state
# *** (compare the last slice of "out" with "hidden" below, they are the same)
# The reason for this is that:
# "out" will give you access to all hidden states in the sequence
# "hidden" will allow you to continue the sequence and backpropagate,
# by passing it as an argument to the lstm at a later time
# Add the extra 2nd dimension
inputs = torch.cat(inputs).view(len(inputs), 1, -1)
hidden = (autograd.Variable(torch.randn(1, 1, 3)), autograd.Variable(
torch.randn((1, 1, 3)))) # clean out hidden state
out, hidden = lstm(inputs, hidden)
print(out)
print(hidden)
One more time: compare the last slice of "out" with "hidden" below, they are the same. Why? Well...
If you're familiar with LSTM's, I'd recommend the PyTorch LSTM docs at this point. Under the output section, notice h_t is output at every t.
Now if you aren't used to LSTM-style equations, take a look at Chris Olah's LSTM blog post. Scroll down to the diagram of the unrolled network:
As you feed your sentence in word-by-word (x_i-by-x_i+1), you get an output from each timestep. You want to interpret the entire sentence to classify it. So you must wait until the LSTM has seen all the words. That is, you need to take h_t where t is the number of words in your sentence.
Code:
Here's a coding reference. I'm not going to copy-paste the entire thing, just the relevant parts. The magic happens at self.hidden2label(lstm_out[-1])
class LSTMClassifier(nn.Module):
def __init__(self, embedding_dim, hidden_dim, vocab_size, label_size, batch_size):
...
self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)
self.lstm = nn.LSTM(embedding_dim, hidden_dim)
self.hidden2label = nn.Linear(hidden_dim, label_size)
self.hidden = self.init_hidden()
def init_hidden(self):
return (autograd.Variable(torch.zeros(1, self.batch_size, self.hidden_dim)),
autograd.Variable(torch.zeros(1, self.batch_size, self.hidden_dim)))
def forward(self, sentence):
embeds = self.word_embeddings(sentence)
x = embeds.view(len(sentence), self.batch_size , -1)
lstm_out, self.hidden = self.lstm(x, self.hidden)
y = self.hidden2label(lstm_out[-1])
log_probs = F.log_softmax(y)
return log_probs
The main problem you need to figure out is the in which dim place you should put your batch size when you prepare your data. As far as I know, if you didn't set it in your nn.LSTM() init function, it will automatically assume that the second dim is your batch size, which is quite different compared to other DNN framework. Maybe you can try:
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
like this to ask your model to treat your first dim as the batch dim.
As a last layer you have to have a linear layer for however many classes you want i.e 10 if you are doing digit classification as in MNIST . For your case since you are doing a yes/no (1/0) classification you have two lablels/ classes so you linear layer has two classes. I suggest adding a linear layer as
nn.Linear ( feature_size_from_previous_layer , 2)
and then train the model using a cross-entropy loss.
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

Resources