PyTorch LSTM: using different sequence lengths for input and target - pytorch

I want to use a denser time series to predict a less dense time series.
I first had input (X) with shape [33405, 4, 25] and target (Y) with shape [33405, 4, 7], in which 33405 is the amount of samples, 4 is the sequence length and 25 & 7 are the output sizes. They thus had a similar sequence length.
I used the following model:
class LSTMModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size, num_layers=1, dropout=0, activation='tanh'):
super().__init__()
self.lstm = nn.LSTM(input_size, hidden_size, num_layers=num_layers, batch_first=True)
self.dropout = nn.Dropout(dropout)
self.linear = nn.Linear(hidden_size, output_size)
init.xavier_uniform_(self.linear.weight)
def forward(self, x):
x, _ = self.lstm(x)
x = self.dropout(x)
x = self.linear(x)
return x
I got correctly an output of shape [batch_size, 4, 7].
However, I want to do something similar, but now use a more dense time series (sequence length 92) to predict the same sequence of 4. This means that I have an input X that has a sequence length of 92 and a target Y that has a sequence length of 4. My input (X) now has shape [33540, 92, 7] and target (Y) shape [33540, 4, 7].
I use the same model, but now my output has shape [4, 92, 7]. However, I want it, again, to be [batch_size, 4, 7].
I’m a newbie with LSTM and RNNs. Is it possible to work with an X and Y that have different sequence lengths? If so, how should I alter my model to get the desired output?

Related

How do I set the dimensions of Conv1d correctly?

This is a toy example as I'm learning PyTorch and using it on one-dimensional time series, in this case a sine wave.
I'm trying to use Conv1d, but I get the following error:
RuntimeError: Given groups=1, weight of size [5, 1, 2], expected input[1, 994, 5] to have 1 channels, but got 994 channels instead
My 'lookback' is 5 time steps, and the shape of my data batch is [994, 5].
What am I doing wrong?
import torch;from torch.utils.data import Dataset, DataLoader
import torch.nn.functional as F;import pytorch_lightning as pl
from torch import nn, tensor
class TsDs(torch.utils.data.Dataset):
def __init__(self, s, l=5): super().__init__();self.l,self.s=l,s
def __len__(self): return self.s.shape[0] - 1 - self.l
def __getitem__(self, i): return self.s[i:i+self.l], torch.log(self.s[i+self.l+1]/self.s[i+self.l])
def plt(self): plt.plot(self.s)
class TsDm(pl.LightningDataModule):
def __init__(self, length=5000, batch_size=1000): super().__init__();self.batch_size=batch_size;self.s = torch.sin(torch.arange(length)*0.2) + 5
def train_dataloader(self): return DataLoader(TsDs(self.s[:3999]), batch_size=self.batch_size, shuffle=False)
def val_dataloader(self): return DataLoader(TsDs(self.s[4000:]), batch_size=self.batch_size)
dm = TsDm()
class MyModel(pl.LightningModule):
def __init__(self, learning_rate=0.01):
super().__init__();self.learning_rate = learning_rate
super().__init__();self.learning_rate = learning_rate
self.network = nn.Sequential(nn.Conv1d(1,5,2),nn.ReLU(),nn.Linear(5,3),nn.ReLU(),nn.Linear(3,1), nn.Tanh())
# self.network = nn.Sequential(nn.Linear(5,5),nn.ReLU(),nn.Linear(5,3),nn.ReLU(),nn.Linear(3,1), nn.Tanh())
def forward(self, x): return self.network(x)
def step(self, batch, batch_idx, stage):
x, y = batch
loss = -torch.mean(self(x)*y)
print(loss)
return loss
def training_step(self, batch, batch_idx): return self.step(batch, batch_idx, "train")
def validation_step(self, batch, batch_idx): return self.step(batch, batch_idx, "val")
def configure_optimizers(self): return torch.optim.SGD(self.parameters(), lr=self.learning_rate)
mm = MyModel(0.01);trainer = pl.Trainer(max_epochs=10)
trainer.fit(mm, datamodule=dm)
There are two issues in your code:
Looking at the documentation of nn.Conv1d, your input shape should be (B, C, L). In your default case, you have L=5, the sequence length, but you need to create that extra dimension representing the feature size of a sequence element, here C=1. You can do so by changing TsDs's __getitem__ function to:
def __getitem__(self, i):
x = self.s[i:i+self.l] # minibatch x shaped (1, self.l)
y = torch.log(self.s[i+self.l+1]/self.s[i+self.l]) # minibatch y shaped (1,)
return x, y
Your convolutional layer has a stride of 1 and a size of 2, this means its output will be shaped (B, 5, L-1=4). The following layer is a fully connected layer instantiated as nn.Linear(5, 3), which means it expects (*, H_in=5) and will output (*, H_out). You can either
You can flatten the conv1d output with nn.Flatten and feed it to a bigger fully connected layer (for instance nn.Linear(20, 3).
You can use a convolutional layer with a wider kernel, if you use a kernel of 5 (your sequence length you will end up with a tensor of (B, 5, 1) which you feed to a nn.Linear(5, 3). Although this approach doesn't really scale when L is changed.
You could apply a nn.AvgPool1d to get an average representation of the sequence after the convolutional layers have been applied.
Those are just a few directions...

RuntimeError: The size of tensor a (3) must match the size of tensor b (20) at non-singleton dimension 2

I have an input tensor X with shape torch.Size([3, 1, 20, 20]) (batch_size x channels x height x width) and a target tensor y with same shape torch.Size([3, 1, 20, 20]) now when I use mean square error loss, it gives me a warning:
Using a target size (torch.Size([3, 1, 20, 20])) that is different to the input size (torch.Size([3, 1]))
but input shape is torch.Size([3, 1, 20, 20]) ?
and then I get the error:
RuntimeError: The size of tensor a (3) must match the size of tensor b (20) at non-singleton dimension 2
code:
def __init__(self):
super().__init__()
self.l1 = torch.nn.Linear(20 * 20 , 1)
def forward(self, x):
return torch.relu(self.l1(x.view(x.size(0), -1)))
def training_step(self, batch, batch_nb):
X, y = batch
loss = F.mse_loss(self(X), y, reduction='mean')
return loss
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=0.02)
can someone help please
Your input is of size [3,1,20,20] but the output of the model (self) is of size [3,1] because you pass the input through a fully connected layer and then reshape (view) it.
I'd offer code on how to fix this issue but it's unclear what your goal actually is here.

CrossEntropyLoss on sequences

I need to compute the torch.nn.CrossEntropyLoss on sequences.
The output tensor y_est has shape: [batch_size, sequence_length, embedding_dim]. The values are embedded as one-hot vectors with embedding_dim dimensions (y_est is not binary however).
The target tensor y has shape: [batch_size, sequence_length] and contains the integer index of the correct class in the range [0, embedding_dim).
If I compute the loss on the two input data, with the shape described above, I get an error 1.
What I would like to do is described by the cycle at [2]. For each sequence in the batch, I would like the sum of the losses computed on each element in the sequence.
After reading the documentation of torch.nn.CrossEntropyLoss I came up with the solution [3], which seems to compute exactly what I want: the losses computed at point [2] and [3] are equale.
However, since .permute(.) returns a view of the original tensor, I am afraid it might mess up the backward propagation on the loss. Somewhere (I do not remember where, sorry) I have read that views should not be used in computing the loss.
Is my solution correct?
import torch
batch_size = 5
seq_len = 10
emb_dim = 100
y_est = torch.randn( (batch_size, seq_len, emb_dim))
y = torch.randint(0, emb_dim, (batch_size, seq_len) )
print("y_est, batch x seq x emb:", y_est.shape)
print("y, batch x seq", y.shape)
loss_fn = torch.nn.CrossEntropyLoss(reduction="none")
# [1]
# loss = loss_fn(y_est, y)
# error:
# RuntimeError: Expected target size [5, 100], got [5, 10]
[2]
loss = 0
for i in range(y_est.shape[1]):
loss += loss_fn ( y_est[:, i, :], y[:, i]).sum()
print(loss)
[3]
y_est_2 = torch.permute( y_est, (0, 2, 1))
print("y_est_2", y_est_2.shape)
loss2 = loss_fn(y_est_2, y).sum()
print(loss2)
whose output is:
y_est, batch x seq x emb: torch.Size([5, 10, 100])
y, batch x seq torch.Size([5, 10])
tensor(253.9994)
y_est_2 torch.Size([5, 100, 10])
tensor(253.9994)
Is the solution correct (also for what concerns the backward pass)? Is there a better way?
If y_est are probabilities you really want to compute the error/loss of a categorical output in each timestep/element of a sequence then y and y_est have to have the same shape. To do so, the categories/classes of y can be expanded to the same dim as y_est with one-hot encoding
import torch
batch_size = 5
seq_len = 10
emb_dim = 100
y_est = torch.randn( (batch_size, seq_len, emb_dim))
y = torch.randint(0, emb_dim, (batch_size, seq_len) )
y = torch.nn.functional.one_hot(y, num_classes=emb_dim).type(torch.float)
loss_fn = torch.nn.CrossEntropyLoss()
loss = loss_fn(y_est, y)
print(loss)

How does Pytorch build the computation graph

Here is example pytorch code from the website:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 3x3 square convolution
# kernel
self.conv1 = nn.Conv2d(1, 6, 3)
self.conv2 = nn.Conv2d(6, 16, 3)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 6 * 6, 120) # 6*6 from image dimension
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
In the forward function, we simply apply a series of transformations to x, but never explicitly define which objects are part of that transformation. Yet when computing the gradient and updating the weights, Pytorch 'magically' knows which weights to update and how the gradient should be calculated.
How does this process work? Is there code analysis going on, or something else that I am missing?
Yes, there is implicit analysis on forward pass. Examine the result tensor, there is thingie like grad_fn= <CatBackward>, that's a link, allowing you to unroll the whole computation graph. And it is built during real forward computation process, no matter how you defined your network module, object oriented with 'nn' or 'functional' way.
You can exploit this graph for net analysis, as torchviz do here: https://github.com/szagoruyko/pytorchviz/blob/master/torchviz/dot.py

Prepare Decoder of a Sequence to Sequence Network in PyTorch

I was working with Sequence to Sequence models in Pytorch. Sequence to Sequence Models comprises of an Encoder and a Decoder.
The Encoder convert a (batch_size X input_features X num_of_one_hot_encoded_classes) -> (batch_size X input_features X hidden_size)
The Decoder will take this input sequence and convert it into (batch_size X output_features X num_of_one_hot_encoded_classes)
An example would be like-
So on the above example, I would need to convert the 22 input features to 10 output features. In Keras it could be done with a RepeatVector(10).
An Example -
model.add(LSTM(256, input_shape=(22, 98)))
model.add(RepeatVector(10))
model.add(Dropout(0.3))
model.add(LSTM(256, return_sequences=True))
Although, I'm not sure if it's the proper way to convert the input sequences into the output ones.
So, my question is -
What's the standard way to convert the input sequences to
output ones. eg. converting from (batch_size, 22, 98) -> (batch_size,
10, 98)? Or how should I prepare the Decoder?
Encoder Code snippet (Written in Pytorch) -
class EncoderRNN(nn.Module):
def __init__(self, input_size, hidden_size):
super(EncoderRNN, self).__init__()
self.hidden_size = hidden_size
self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size,
num_layers=1, batch_first=True)
def forward(self, input):
output, hidden = self.lstm(input)
return output, hidden
Well, you have to options, first one is to repeat the encoder's last state for 10 times and give it as input to the decoder, like this:
import torch
input = torch.randn(64, 22, 98)
encoder = torch.nn.LSTM(98, 256, batch_first=True)
encoded, _ = encoder(input)
decoder_input = encoded[:, -1:].repeat(1, 10, 1)
decoder = torch.nn.LSTM(256, 98, batch_first=True)
decoded, _ = decoder(decoder_input)
print(decoded.shape) #torch.Size([64, 10, 98])
Another option is to use an attention mechanism, like this:
#assuming we have obtained the encoded sequence and declared the decoder as before
attention_calculator = torch.nn.Conv1d(256+98, 1, kernel_size=1)
hidden = (torch.zeros(1, 64, 98), torch.zeros(1, 64, 98))
outputs = []
for i in range(10):
attention_input = torch.cat([hidden[0][0][:, None, :].expand(-1, 22, -1), encoded], dim=2).permute(0, 2, 1)
attention_value = torch.nn.functional.softmax(attention_calculator(attention_input).squeeze(), dim=1)
decoder_input = (attention_value[:, :, None] * encoded).sum(dim=1, keepdim=True)
output, hidden = decoder(decoder_input, hidden)
outputs.append(output)
outputs = torch.cat(outputs, dim=1)

Resources