I'm using every past 7 days' data to predict today's value (price).
For each day, there are 6 features (Let's call them feature1 - feature5, and price).
Suppose I have 1000 rows of data. Therefore, what should be the shape of my data to be used in LSTM Pytorch?
Is it (1000, 7, 6)?
If you check the documentation, LSTM requires the input of shape seq_len x batch_size x input_size. If you declare LSTM with batch_first = True, then LSTM would expect an input of shape batch_size x seq_len x input_size.
Now, in your case, since you have 1000 data records, I assume that is your training data size. You can split 1000 records into small batches and feed them to the LSTM.
For the seq_len and input_size, you can have the size 7 x 6 where 7 = number of days and 6 = number of features.
However, my concern is on your problem definition. In your problem, you have 5 features and price is the target variable, whose value you want the model to predict. So, you can feed the 5 feature values to LSTM and use the output vectors to predict the price value.
A reasonable network would be:
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
# input_size = 5 (number of features), output_size = 50
self.lstm = nn.LSTM(5, 50, 1, batch_first=True)
# output_size = 1 (target price)
self.dense = nn.Linear(50, 1)
def forward(self, x):
x = self.dense(self.lstm(x)[0])
return x
model = Model()
batch_input = torch.randn(16, 7, 5) # => batch_size = 16
y = model(batch_input) # => torch.Size([16, 7, 1])
Now, you can optimize the model using MSELoss since the task is more like a regression problem.
Related
I have a network to simultaneously predict a max, and a min, using the same logits (an impossible task, but hear me out). Basically, I want to turn a knob to say "now predict the max of a given set of values" or to predict the min. If the knob is in between, it 'll predict a min or max with 50% probability. My code is based on the You Only Train Once paper: https://openreview.net/pdf?id=HyxY6JHKwr. The paper claims that you can train one network, and then tune how you combine the losses, to produce the network you want. So in my case, I want to tune it in such a way that my network either predicts the max of a given set of numbers, or the min. But I am failing at this task. My network model is as follows
class MyModel(Module):
def __init__(self, vocab_size, embedding_dim, input_dim):
super(MyModel, self).__init__()
self.input_dim = input_dim
self.embedding_dim = embedding_dim
self.emb = Embedding(num_embeddings = vocab_size, embedding_dim = embedding_dim)
self.l1 = Linear(input_dim * embedding_dim, 64)
self.l2 = Linear(64, 32)
self.l3 = Linear(32,10)
self.loss_parameter_mlp = Sequential(
Linear(2, 2),
Sigmoid(),
)
def forward(self, x, lambd):
lambd = self.loss_parameter_mlp(lambd)
x = self.emb(x).reshape(-1, self.input_dim * self.embedding_dim)
x = ReLU()(self.l1(x))
x = x * lambd[:,0].reshape(-1, 1) + lambd[:,1].reshape(-1,1)
x = ReLU()(self.l2(x))
x = x * lambd[:,0].reshape(-1, 1) + lambd[:,1].reshape(-1,1)
logits = ReLU()(self.l3(x))
return logits
My inputs are 10 integers from 1 to 99, and my model outputs are the logits - the argmax of which should contain either the min, or the max, based on my hyperparameters lambd. I specifically chose this problem, since I want the network to predict two polar opposites (max and min) at the same time, which it cannot. It's (in my mind) a simpler version of the problem the paper is trying to solve. My training code is as shown below
# Training
epochs = 200
alpha = np.linspace(0, 1, epochs)
np.random.shuffle(alpha)
np.linspace(1, 0, int(epochs/2))))
for epoch in range(epochs):
lambd = torch.tensor([[alpha[epoch], (1 - alpha[epoch])]], dtype=torch.float32)
for batch, x in enumerate(train_loader):
y_max = torch.argmax(x, axis=1)
y_min = torch.argmin(x, axis=1)
lambd_b = lambd.expand(len(y_max), -1)
y_pred = model(x, lambd_b)
loss_max = CE_loss(y_pred, y_max)
loss_min = CE_loss(y_pred, y_min)
optimizer.zero_grad()
loss = alpha[epoch] * loss_max + (1 - alpha[epoch]) * loss_min
loss.backward()
optimizer.step()
However, the network learns to ignore the parameters lambd (in other words, the knobs to tune max or min just doesn't work). The network does learn to predict max and min (they share the same accuracy) - which is expected. What should I do to ensure that the knobs work?
I'm pretty new at programming cnn so I'm a little bit lost. I'm trying to do this part of the code, where they ask me to implement a fully-connected network to classify the digits. It should contain 1 hidden layer with 20 units. I should use ReLU activation function on the hidden layer.
class Network(nn.Module):
def __init__(self):
super(Network, self).__init__()
self.fc1 = ...
self.fc2 = nn.Sequential(
nn.Linear(500,10),
nn.Softmax(dim = 1)
)
def forward(self, x):
x = x.view(x.size(0),-1)
x = self.fc1(x)
x = self.fc2(x)
return x
The dots are the part to fill, I think about this line:
self.fc1 = nn.Linear(20, 500)
But I don't know if it's correct. Could someone help me please? And I don't understand at all what the function Softmax do... so if someone knows it please.
Thank you so much!!
Pd. This is the code to load the data:
batch_size = 64
trainset = datasets.MNIST('./data', train=True, download=True, transform=transforms.ToTensor())
train_loader = DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=1)
testset = datasets.MNIST('./data', train=False, download=True, transform=transforms.ToTensor())
test_loader = DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=1)
From the code given for the model, it can be seen that the hidden layer has 500 units. So I am assuming you meant 20 units for input. With this assumption, the code must be:
self.fc1 = nn.Sequential(
nn.Linear(20, 500),
nn.ReLU()
)
Coming to the next part of your question, given that you are working with MNIST dataset and you have the softmax function, I am assuming you are trying to predict the number present in the images.
Your neural network performs various multiplication and addition operations in each layer and finally, you end up with 10 numbers in the output layer. Now, you have to make sense of these 10 numbers to decide which of the 10 digits is given in the image.
One way to do this would be to select the unit which has the maximum value. For example if the 10th unit has the maximum value among all units, then we conclude that the digit is '9'. If the 2nd unit has the maximum value, then we conclude that the digit is '1'.
This is fine but a better way would be to convert the values of each of the units to probability that the corresponding digit is contained in the image and then we choose the digit having highest probability. This has certain mathematical advantages which helps us in defining a better loss function.
Softmax is what helps us to convert the values to probabilities. On applying softmax, all the values lie in the range (0, 1) and they sum up to 1.
If you are interested in deeplearning and the math behind it, I would suggest you to checkout Andrew NG's course on deeplearning.
You did not mention the shape of your data so I'll be assuming the expected shape returned by datasets.MNIST.
Data shape: torch.Size([64, 1, 28, 28])
class Network(nn.Module):
def __init__(self):
super(Network, self).__init__()
self.fc1 = nn.Sequential(
nn.Linear(1*28*28, 20),
nn.ReLU())
self.fc2 = nn.Sequential(
nn.Linear(500,10),
nn.Softmax(dim = 1))
def forward(self, x):
x = x.view(x.size(0), -1)
x = self.fc1(x)
x = self.fc2(x)
return x
The first argument of nn.Linear is the size of input feature while the second is the number of units.
For self.fc1, the size of the input feature is the multiplication of your data shape except the batch size, which is 1 * 28 * 28. And as per your post the second argument should be 20 (20 units).
The shape of the output from self.fc1 (which is also the input to self.fc2) will then be (batch size, 20).
For self.fc2, the size of the input feature will be 20 while the number of units (which is also the number of digits) will be 10.
I'm having trouble understanding how batches play a role into the Pytorch framework.
In this model:
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
# 28x28x1 => 26x26x32
self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3)
self.d1 = nn.Linear(26 * 26 * 32, 128)
self.d2 = nn.Linear(128, 10)
def forward(self, x):
# 32x1x28x28 => 32x32x26x26
x = self.conv1(x)
x = F.relu(x)
# flatten => 32 x (32*26*26)
x = x.flatten(start_dim = 1)
#x = x.view(32, -1)
# 32 x (32*26*26) => 32x128
x = self.d1(x)
x = F.relu(x)
# logits => 32x10
logits = self.d2(x)
out = F.softmax(logits, dim=1)
return out
In the forward definition, we pass in some x, ie. aggregated images for a batch from a DataLoader. Here, the 32x1x28x28 dimension indicates that there are 32 images in a batch. Do we just ignore this fact and Pytorch handles applying Conv2d to each sample? The forward propagation seems to be just relative to a single image.
Indeed, the network is agnostic to batches: The model is designed to classify a single image.
So why do we need batches for?
Each model has weights (aka parameters) and one needs to optimize the weights using the training images so that the model will classify images as correctly as possible.
This optimization process is usually carried out using Stochastic Gradient Descent (SGD): we are using the current values of the weights to classify a batch of images. Using the prediction the current model made, and the expected predictions we know should be (the "labels") we can compute a gradient of the weights and improve the model.
I have work on Autoencoder typed model with the attention method. Around 10000 batches of data are fed into the model and each batch contains 30 images (30 is the "step_size" in ConvLSTM) with a shape of (5, 5, 3 [R,G,B]).
Therefore, the array is of shape (10000, 30, 5, 5, 3) (batch_size, step_size, image_height, image_width, scale).
I intentionally made an output array shape as (1,5,5,3), because each image has to be handled independently to apply attention method to.
When I link all operations with tf.keras.Model such that its input has the shape of (10000,30,5,5,3) and the output shape of (1,5,5,3).
history = model.fit(train_data, train_data, batch_size = 1, epochs = 3)
I am trying to modify arguments in Model module, but it seems not working because the output shape is not the same as the input.
Are there any possible ways to feed data one by one?
I am eventually running a code something like:
model = keras.Model(intput, output)
model.compile(optimizer='adam',loss= tf.keras.losses.MSE)
history = model.fit(train_data, train_data, batch_size = 1, epochs = 3)
It could've done with GradientTape, feeding one by one.
def train(loss, model, opt, x_inp):
with tf.GradientTape() as tape:
gradients = tape.gradient(loss(model, x_inp), model.trainable_variables)
gradient_variables = zip(gradients, model.trainable_variables)
opt.apply_gradients(gradient_variables)
opt = tf.optimizers.Adam(learning_rate=learning_rate)
import datetime
current_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
train_summary_writer = tf.summary.create_file_writer(train_log_dir)
epochs = 3
with train_summary_writer.as_default():
with tf.summary.record_if(True):
for epoch in range(epochs):
for train_id in range(0, len(batch_data)):
x_inp = np.reshape(np.asarray(batch_data), [-1, step_max, sensor_n, sensor_n, scale_n])
train(loss, model, opt, x_inp)
loss_values = loss(model, x_inp)
reconstructed = np.reshape(model(x_inp), [1, sensor_n, sensor_n, scale_n])
print("loss : {}".format(loss_values.numpy()))
I followed this great answer for sequence autoencoder,
LSTM autoencoder always returns the average of the input sequence.
but I met some problem when I try to change the code:
question one:
Your explanation is so professional, but the problem is a little bit different from mine, I attached some code I changed from your example. My input features are 2 dimensional, and my output is same with the input.
for example:
input_x = torch.Tensor([[0.0,0.0], [0.1,0.1], [0.2,0.2], [0.3,0.3], [0.4,0.4]])
output_y = torch.Tensor([[0.0,0.0], [0.1,0.1], [0.2,0.2], [0.3,0.3], [0.4,0.4]])
the input_x and output_y are same, 5-timesteps, 2-dimensional feature.
import torch
import torch.nn as nn
import torch.optim as optim
class LSTM(nn.Module):
def __init__(self, input_dim, latent_dim, num_layers):
super(LSTM, self).__init__()
self.input_dim = input_dim
self.latent_dim = latent_dim
self.num_layers = num_layers
self.encoder = nn.LSTM(self.input_dim, self.latent_dim, self.num_layers)
# I changed here, to 40 dimesion, I think there is some problem
# self.decoder = nn.LSTM(self.latent_dim, self.input_dim, self.num_layers)
self.decoder = nn.LSTM(40, self.input_dim, self.num_layers)
def forward(self, input):
# Encode
_, (last_hidden, _) = self.encoder(input)
# It is way more general that way
encoded = last_hidden.repeat(input.shape)
# Decode
y, _ = self.decoder(encoded)
return torch.squeeze(y)
model = LSTM(input_dim=2, latent_dim=20, num_layers=1)
loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters())
y = torch.Tensor([[0.0,0.0], [0.1,0.1], [0.2,0.2], [0.3,0.3], [0.4,0.4]])
x = y.view(len(y), -1, 2) # I changed here
while True:
y_pred = model(x)
optimizer.zero_grad()
loss = loss_function(y_pred, y)
loss.backward()
optimizer.step()
print(y_pred)
The above code can learn very well, can you help review the code and give some instructions.
When I input 2 examples as the input to the model, the model cannot work:
for example, change the code:
y = torch.Tensor([[0.0,0.0], [0.1,0.1], [0.2,0.2], [0.3,0.3], [0.4,0.4]])
to:
y = torch.Tensor([[[0.0,0.0],[0.5,0.5]], [[0.1,0.1], [0.6,0.6]], [[0.2,0.2],[0.7,0.7]], [[0.3,0.3],[0.8,0.8]], [[0.4,0.4],[0.9,0.9]]])
When I compute the loss function, it complain some errors? can anyone help have a look
question two:
my training samples are with different length:
for example:
x1 = [[0.0,0.0], [0.1,0.1], [0.2,0.2], [0.3,0.3], [0.4,0.4]] #with 5 timesteps
x2 = [[0.5,0.5], [0.6,0.6], [0.7,0.7]] #with only 3 timesteps
How can I input these two training sample into the model at the same time for a batch training.
Recurrent N-dimensional autoencoder
First of all, LSTMs work on 1D samples, yours are 2D as it's usually used for words encoded with a single vector.
No worries though, one can flatten this 2D sample to 1D, example for your case would be:
import torch
var = torch.randn(10, 32, 100, 100)
var.reshape((10, 32, -1)) # shape: [10, 32, 100 * 100]
Please notice it's really not general, what if you were to have 3D input? Snippet belows generalizes this notion to any dimension of your samples, provided the preceding dimensions are batch_size and seq_len:
import torch
input_size = 2
var = torch.randn(10, 32, 100, 100, 35)
var.reshape(var.shape[:-input_size] + (-1,)) # shape: [10, 32, 100 * 100 * 35]
Finally, you can employ it inside neural network as follows. Look at forward method especially and constructor arguments:
import torch
class LSTM(nn.Module):
# input_dim has to be size after flattening
# For 20x20 single input it would be 400
def __init__(
self,
input_dimensionality: int,
input_dim: int,
latent_dim: int,
num_layers: int,
):
super(LSTM, self).__init__()
self.input_dimensionality: int = input_dimensionality
self.input_dim: int = input_dim # It is 1d, remember
self.latent_dim: int = latent_dim
self.num_layers: int = num_layers
self.encoder = torch.nn.LSTM(self.input_dim, self.latent_dim, self.num_layers)
# You can have any latent dim you want, just output has to be exact same size as input
# In this case, only encoder and decoder, it has to be input_dim though
self.decoder = torch.nn.LSTM(self.latent_dim, self.input_dim, self.num_layers)
def forward(self, input):
# Save original size first:
original_shape = input.shape
# Flatten 2d (or 3d or however many you specified in constructor)
input = input.reshape(input.shape[: -self.input_dimensionality] + (-1,))
# Rest goes as in my previous answer
_, (last_hidden, _) = self.encoder(input)
encoded = last_hidden.repeat(input.shape)
y, _ = self.decoder(encoded)
# You have to reshape output to what the original was
reshaped_y = y.reshape(original_shape)
return torch.squeeze(reshaped_y)
Remember you have to reshape your output in this case. It should work for any dimensions.
Batching
When it comes to batching and different length of sequences it is a little more complicated.
You have to pad each sequence in batch before pushing it through network. Usually, values with which you pad are zeros, you may configure it inside LSTM though.
You may check this link for an example. You will have to use functions like torch.nn.pack_padded_sequence and others to make it work, you may check this answer.
Oh, since PyTorch 1.1 you don't have to sort your sequences by length in order to pack them. But when it comes to this topic, grab some tutorials, should make things clearer.
Lastly: Please, separate your questions. If you perform the autoencoding with single example, move on to batching and if you have issues there, please post a new question on StackOverflow, thanks.