Why are saved model so large in pytorch? - pytorch

I have implemented a neural network with an LSTM model (see below). After training the model with a hidden size of 512, I saved it by calling torch.save(model).
I expected the model size to measure in the low tens of kilobytes, accounting for three layers of LSTM's hidden parameters. I also enumerated the parameters via model.named_parameters(). I was surprised to find that the actual unzipped size of the parameters is more like 30 MB.
Why is the saved model so large?
class IRNN(nn.Module):
def __init__(self, obs_size, hidden_size):
super(IRNN, self).__init__()
self.num_layers = 3
input_size = obs_size
self.hidden_size = hidden_size
self.hidden = self.init_hidden()
self.hiddenc = self.init_hidden()
self.rnn = nn.LSTM(
input_size=input_size, hidden_size=hidden_size, num_layers=self.num_layers
)
self.i2o = nn.Linear(hidden_size, 256)

Related

Pytorch lightning validation set has different image sizes than training set

When i try to train a cnn, I get different shapes for the same dataloader and i dont know why. This is the output of the shapes I feed into the model:
You can see that my validation shape is [batch size, 1, image height and width]. for some reason, the image size gets changed in the last step and the batch size is 1. The same happens when I use the sanity check from pytorch lightning beforehand, which ive disabled for now. This is how the pytorch lightning data module looks like which gets the dataloader:
class MRIDataModule(pl.LightningDataModule):
def __init__(self, batch_size, data_paths):
super().__init__()
self.batch_size = batch_size
self.data_paths = data_paths
self.train_set = None
self.val_set = None
def setup(self, stage=None):
loader = get_data_loader()
self.train_set = loader(self.data_paths['train_dir'], transform=None, dimension=DIMENSION, nslice=NSLICE)
self.val_set = loader(self.data_paths['val_dir'], transform=None, dimension=DIMENSION, nslice=NSLICE)
def train_dataloader(self):
return DataLoader(self.train_set, batch_size=self.batch_size, num_workers=NUM_WORKERS, shuffle=True)
def val_dataloader(self):
return DataLoader(self.val_set, batch_size=self.batch_size, num_workers=NUM_WORKERS, shuffle=False)
here is the full code and the print statements are directly from the forward function of my model:
https://colab.research.google.com/drive/1yfbCZlwNMqaW1egaTF8HHRD4Ko8iMTxr?usp=sharing
I inspected you code and found the following:
def validation_epoch_end(self, val_step_outputs):
dummy_input = torch.zeros((1, 1, 150,150), device = device)
model_filename = CONFIG['MODEL'] + "-DIM" + str(CONFIG["DIMENSION"]) + "-model_final.onnx"
torch.onnx.export(self.net.eval(), dummy_input, model_filename)
This piece of code will be called every time your validation epoch is over. Which means that you will pass your dummy_input of size (1, 1, 150,150) to the model. That is why your are seeing a different images shape for the last validation step, than your batches coming from your dataloader

Question about input dimension for conv2D-LSTM implement

I am a PyTorch beginner and would like to get help applying the conv2d-LSTM model.
I have a 2D image (1 channel x Time x Frequency) that contains time and frequency information.
I’d like to extract features automatically using conv2D and then LSTM model because 2D image contains time information
According to PyTorch documents, the output shape of conv2D is (Batch size, Channel out, Height out, Width out) and the input shape of LSTM is (Batch size, sequence length, input size). From that, I thought before input features of the LSTM network there need to reshape the output features of conv2D.
I expected the cnn-lstm model to perform well because it could learn the characteristics and time information of the image, but it did not get the expected performance.
My question is when I insert data into the LSTM model, is there any idea that LSTM learns the data by each row without flattening? Should I always flatten the 2D output?
My networks code and input/output shape are as follows. (I maintained the width size in the conv layer to preserve time information.)
Thanks a lot
class CNN_LSTM(nn.Module):
def __init__(self, paramArr1, paramArr2):
super(CNN_LSTM, self).__init__()
self.input_dim = paramArr2[0]
self.hidden_dim = paramArr2[1]
self.n_layers = paramArr2[2]
self.batch_size = paramArr2[3]
self.conv = nn.Sequential(
nn.Conv2d(1, out_channels=paramArr1[0],
kernel_size=(paramArr1[1],1),
stride=(paramArr1[2],1)),
nn.BatchNorm2d(paramArr1[0]),
nn.ReLU(),
nn.MaxPool2d(kernel_size = (paramArr1[3],1),stride=(paramArr1[4],1))
)
self.lstm = nn.LSTM(input_size = paramArr2[0],
hidden_size=paramArr2[1],
num_layers=paramArr2[2],
batch_first=True)
self.linear = nn.Linear(in_features=paramArr2[1], out_features=1)
def reset_hidden_state(self):
self.hidden = (
torch.zeros(self.n_layers, self.batch_size, self.hidden_dim).to(device),
torch.zeros(self.n_layers, self.batch_size, self.hidden_dim).to(device)
)
def forward(self, x):
x = self.conv(x)
x = x.view(x.size(0), x.size(1),-1)
x = x.permute(0,2,1)
out, (hn, cn) = self.lstm(x, self.hidden)
out = out.squeeze()[-1, :]
out = self.linear(out)
return out
model input/output shape

Pytorch many-to-many time series LSTM always predicts the mean

I want to create an LSTM model using pytorch that takes multiple time series and creates predictions of all of them, a typical "many-to-many" LSTM network.
I am able to achieve what I want in keras. I create a set of data with three variables which are simply linearly spaced with some gaussian noise. Training the keras model I get a prediction 12 steps ahead that is reasonable.
When I try the same thing in pytorch the, model will always predict the mean of the input data. This is confirmed when looking at the loss during training I can see that the model never seems to perform better than just predicting the mean.
TL;DR; The question is: How can I achieve the same thing in pytorch as in the keras example in the gist below?
Full working examples are available here https://gist.github.com/jonlachmann/5cd68c9667a99e4f89edc0c307f94ddb
The keras network is defined as
model = Sequential()
model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))
model.add(LSTM(100, activation='relu'))
model.add(Dense(n_features))
model.compile(optimizer='adam', loss='mse')
and the pytorch network is
# Define the pytorch model
class torchLSTM(torch.nn.Module):
def __init__(self, n_features, seq_length):
super(torchLSTM, self).__init__()
self.n_features = n_features
self.seq_len = seq_length
self.n_hidden = 100 # number of hidden states
self.n_layers = 1 # number of LSTM layers (stacked)
self.l_lstm = torch.nn.LSTM(input_size=n_features,
hidden_size=self.n_hidden,
num_layers=self.n_layers,
batch_first=True)
# according to pytorch docs LSTM output is
# (batch_size,seq_len, num_directions * hidden_size)
# when considering batch_first = True
self.l_linear = torch.nn.Linear(self.n_hidden * self.seq_len, 3)
def init_hidden(self, batch_size):
# even with batch_first = True this remains same as docs
hidden_state = torch.zeros(self.n_layers, batch_size, self.n_hidden)
cell_state = torch.zeros(self.n_layers, batch_size, self.n_hidden)
self.hidden = (hidden_state, cell_state)
def forward(self, x):
batch_size, seq_len, _ = x.size()
lstm_out, self.hidden = self.l_lstm(x, self.hidden)
# lstm_out(with batch_first = True) is
# (batch_size,seq_len,num_directions * hidden_size)
# for following linear layer we want to keep batch_size dimension and merge rest
# .contiguous() -> solves tensor compatibility error
x = lstm_out.contiguous().view(batch_size, -1)
return self.l_linear(x)

PyTorch multi-class: ValueError: Expected input batch_size (416) to match target batch_size (32)

I have created a mutli-class classification neural network. Training, and validation iterators where created with BigBucketIterator method with fields {'text_normalized_tweet':TEXT, 'label': LABEL}
TEXT = a tweet
LABEL = a float number (with 3 values: 0,1,2)
Below I execute a dummy example of my neural network:
import torch.nn as nn
class MultiClassClassifer(nn.Module):
#define all the layers used in model
def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
#Constructor
super(MultiClassClassifer, self).__init__()
#embedding layer
self.embedding = nn.Embedding(vocab_size, embedding_dim)
#dense layer
self.hiddenLayer = nn.Linear(embedding_dim, hidden_dim)
#Batch normalization layer
self.batchnorm = nn.BatchNorm1d(hidden_dim)
#output layer
self.output = nn.Linear(hidden_dim, output_dim)
#activation layer
self.act = nn.Softmax(dim=1) #2d-tensor
#initialize weights of embedding layer
self.init_weights()
def init_weights(self):
initrange = 1.0
self.embedding.weight.data.uniform_(-initrange, initrange)
def forward(self, text, text_lengths):
embedded = self.embedding(text)
#packed sequence
packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths, batch_first=True)
tensor, batch_size = packed_embedded[0], packed_embedded[1]
hidden_1 = self.batchnorm(self.hiddenLayer(tensor))
return self.act(self.output(hidden_1))
Instantiate the model
INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
HIDDEN_DIM = 64
OUTPUT_DIM = 3
model = MultiClassClassifer(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM)
When I call
text, text_lengths = batch.text_normalized_tweet
predictions = model(text, text_lengths).squeeze()
loss = criterion(predictions, batch.label)
it returns,
ValueError: Expected input batch_size (416) to match target batch_size (32).
model(text, text_lengths).squeeze() = torch.Size([416, 3])
batch.label = torch.Size([32])
I can see that the two objects have different sizes, but I have no clue how to fix this?
You may find the Google Colab notebook here
Shapes of each in, out tensor of my forward() method:
torch.Size([32, 10, 100]) #self.embedding(text)
torch.Size([320, 100]) #nn.utils.rnn.pack_padded_sequence(embedded, text_lengths, batch_first=True)
torch.Size([320, 64]) #self.batchnorm(self.hiddenLayer(tensor))
torch.Size([320, 3]) #self.act(self.output(hidden_1))
You shouldn't be using the squeeze function after the forward pass, that doesn't make sense.
After removing the squeeze function, as you see, the shape of your final output is [320,3] whereas it is expecting [32,3]. One way to fix this is to average out the embeddings you obtain for each word after the self.Embedding function like shown below:
def forward(self, text, text_lengths):
embedded = self.embedding(text)
embedded = torch.mean(embedded, dim=1, keepdim=True)
packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths, batch_first=True)
tensor, batch_size = packed_embedded[0], packed_embedded[1]
hidden_1 = self.batchnorm(self.hiddenLayer(tensor))
return self.act(self.output(hidden_1))

Visualize the output of Vgg16 model by TSNE plot?

I need to visualize the output of Vgg16 model which classify 14 different classes.
I load the trained model and I did replace the classifier layer with the identity() layer but it doesn't categorize the output.
Here is the snippet:
the number of samples here is 1000 images.
epoch = 800
PATH = 'vgg16_epoch{}.pth'.format(epoch)
checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
class Identity(nn.Module):
def __init__(self):
super(Identity, self).__init__()
def forward(self, x):
return x
model.classifier._modules['6'] = Identity()
model.eval()
logits_list = numpy.empty((0,4096))
targets = []
with torch.no_grad():
for step, (t_image, target, classess, image_path) in enumerate(test_loader):
t_image = t_image.cuda()
target = target.cuda()
target = target.data.cpu().numpy()
targets.append(target)
logits = model(t_image)
print(logits.shape)
logits = logits.data.cpu().numpy()
print(logits.shape)
logits_list = numpy.append(logits_list, logits, axis=0)
print(logits_list.shape)
tsne = TSNE(n_components=2, verbose=1, perplexity=10, n_iter=1000)
tsne_results = tsne.fit_transform(logits_list)
target_ids = range(len(targets))
plt.scatter(tsne_results[:,0],tsne_results[:,1],c = target_ids ,cmap=plt.cm.get_cmap("jet", 14))
plt.colorbar(ticks=range(14))
plt.legend()
plt.show()
here is what this script has been produced: I am not sure why I have all colors for each cluster!
The VGG16 outputs over 25k features to the classifier. I believe it's too much to t-SNE. It's a good idea to include a new nn.Linear layer to reduce this number. So, t-SNE may work better. In addition, I'd recommend you two different ways to get the features from the model:
The best way to get it regardless of the model is by using the register_forward_hook method. You may find a notebook here with an example.
If you don't want to use the register, I'd suggest this one. After loading your model, you may use the following class to extract the features:
class FeatNet (nn.Module):
def __init__(self, vgg):
super(FeatNet, self).__init__()
self.features = nn.Sequential(*list(vgg.children())[:-1]))
def forward(self, img):
return self.features(img)
Now, you just need to call FeatNet(img) to get the features.
To include the feature reducer, as I suggested before, you need to retrain your model doing something like:
class FeatNet (nn.Module):
def __init__(self, vgg):
super(FeatNet, self).__init__()
self.features = nn.Sequential(*list(vgg.children())[:-1]))
self.feat_reducer = nn.Sequential(
nn.Linear(25088, 1024),
nn.BatchNorm1d(1024),
nn.ReLU()
)
self.classifier = nn.Linear(1024, 14)
def forward(self, img):
x = self.features(img)
x_r = self.feat_reducer(x)
return self.classifier(x_r)
Then, you can run your model returning x_r, that is, the reduced features. As I told you, 25k features are too much for t-SNE. Another method to reduce this number is by using PCA instead of nn.Linear. In this case, you send the 25k features to PCA and then train t-SNE using the PCA's output. I prefer using nn.Linear, but you need to test to check which one you get a better result.

Resources