I'm currently trying to extend a model that is based on FairSeq/PyTorch. During training I need to train two encoders: one with the target sample, and the original one with the source sample.
So the current forward function looks like this:
def forward(self, src_tokens=None, src_lengths=None, prev_output_tokens=None, **kwargs):
encoder_out = self.encoder(src_tokens, src_lengths=src_lengths, **kwargs)
decoder_out = self.decoder(prev_output_tokens, encoder_out=encoder_out, **kwargs)
return decoder_out
And based on this this idea i want something like this:
def forward_test(self, src_tokens=None, src_lengths=None, prev_output_tokens=None, **kwargs):
encoder_out = self.encoder(src_tokens, src_lengths=src_lengths, **kwargs)
decoder_out = self.decoder(prev_output_tokens, encoder_out=encoder_out, **kwargs)
return decoder_out
def forward_train(self, src_tokens=None, src_lengths=None, prev_output_tokens=None, **kwargs):
encoder_out = self.encoder(src_tokens, src_lengths=src_lengths, **kwargs)
autoencoder_out = self.encoder(tgt_tokens, src_lengths=src_lengths, **kwargs)
concat = some_concatination_func(encoder_out, autoencoder_out)
decoder_out = self.decoder(prev_output_tokens, encoder_out=concat, **kwargs)
return decoder_out
Is there any way to do this?
Edit:
These are the constraints that I have, since I need to extend FairseqEncoderDecoderModel:
#register_model('transformer_mass')
class TransformerMASSModel(FairseqEncoderDecoderModel):
def __init__(self, encoder, decoder):
super().__init__(encoder, decoder)
Edit 2:
The parameters passed to the forward function in Fairseq can be altered by implementing your own Criterion, see for example CrossEntropyCriterion, where sample['net_input'] is passed to the __call__ function of the model, which invokes the forward method.
First of all you should always use and define forward not some other methods that you call on the torch.nn.Module instance.
Definitely do not overload eval() as shown by trsvchn as it's evaluation method defined by PyTorch (see here). This method allows layers inside your model to be put into evaluation mode (e.g. specific changes to layers like inference mode for Dropout or BatchNorm).
Furthermore you should call it with __call__ magic method. Why? Because hooks and other PyTorch specific stuff is registered that way properly.
Secondly, do not use some external mode string variable as suggested by #Anant Mittal. That's what train variable in PyTorch is for, it's standard to differentiate by it whether model is in eval mode or train mode.
That being said you are the best off doing it like this:
import torch
class Network(torch.nn.Module):
def __init__(self):
super().__init__()
...
# You could split it into two functions but both should be called by forward
def forward(
self, src_tokens=None, src_lengths=None, prev_output_tokens=None, **kwargs
):
encoder_out = self.encoder(src_tokens, src_lengths=src_lengths, **kwargs)
if self.train:
return self.decoder(prev_output_tokens, encoder_out=encoder_out, **kwargs)
autoencoder_out = self.encoder(tgt_tokens, src_lengths=src_lengths, **kwargs)
concat = some_concatination_func(encoder_out, autoencoder_out)
return self.decoder(prev_output_tokens, encoder_out=concat, **kwargs)
You could (and arguably should) split the above into two separate methods, but that's not too bad as the function is rather short and readable that way. Just stick to PyTorch's way of handling things if easily possible and not some ad-hoc solutions. And no, there will be no problem with backpropagation, why would there be one?
By default, calling model() invoke forward method which is train forward in your case, so you just need to define new method for your test/eval path inside your model class, smth like here:
Code:
class FooBar(nn.Module):
"""Dummy Net for testing/debugging.
"""
def __init__(self):
super().__init__()
...
def forward(self, x):
# here will be train forward
...
def evaltest(self, x):
# here will be eval/test forward
...
Examples:
model = FooBar() # initialize model
# train time
pred = model(x) # calls forward() method under the hood
# test/eval time
test_pred = model.evaltest(x)
Comment:
I would like to recommend you to split these two forward paths into 2 separate methods, because it easier to debug and to avoid some possible problems when backpropagating.
Related
I am trying to create a data pipeline for U-net for Image Segmentation. I came across Keras.utils.Sequence class through which, I can create a data pipeline, But I am unable to understand how this is working.
link for the code Keras code , Source code
def __iter__(self):
"""Create a generator that iterate over the Sequence."""
for item in (self[i] for i in range(len(self))):
yield item
I will highly appreciate if anyone can tell me how this works ?
You don't need a generator. The sequence class is there to manage that. You need to define a class inherited from tensorflow.keras.utils.Sequence and define the methods:
__init__, __getitem__, __len__. In addition, you can define the method on_epoch_end, which is called at the end of each epoch and is usually used to shuffle the sample indexes.
There is an example in the link you gave Tensorflow Sequence.
Below is another example of Sequence.
Note that you can pass the data to the __init__ constructor, but you may as well read the data from files in the __getitem__ method, assuming you know where to read it, e.g. by passing the name of a directory or directories into the constructor. This is necessary if there is a lot of data.
from tensorflow import keras
import numpy as np
class SequenceExample(keras.utils.Sequence):
def __init__(self, x_in, y_in, batch_size, shuffle=True):
# Initialization
self.batch_size = batch_size
self.shuffle = shuffle
self.x = x_in
self.y = y_in
self.datalen = len(y_in)
self.indexes = np.arange(self.datalen)
if self.shuffle:
np.random.shuffle(self.indexes)
def __getitem__(self, index):
# get batch indexes from shuffled indexes
batch_indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
x_batch = self.x[batch_indexes]
y_batch = self.y[batch_indexes]
return x_batch, y_batch
def __len__(self):
# Denotes the number of batches per epoch
return self.datalen // self.batch_size
def on_epoch_end(self):
# Updates indexes after each epoch
self.indexes = np.arange(self.datalen)
if self.shuffle:
np.random.shuffle(self.indexes)
I'm trying to access model parameters using the internal ._parameters method. When I define the model as below, I get model parameters without any issue
model = nn.Linear(10, 10)
print(model._parameters)
However, when I use this method to get parameters of a model defined as a class, I get an empty OrderedDict().
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.fc = nn.Linear(10, 10)
def forward(self, x):
return self.fc(x)
model = MyModel()
print(model._parameters)
Is there a solution to this using ._parameters?
NOTE: I understand that using internal methods are frowned upon.
There are three types of objects in an nn.Module: tensors stored inside _parameters, buffers inside _buffers, and modules inside _modules. All three are private (indicated by the _ prefix), as such they are not meant to be used by the end-user.
The private nn.Module attribute _parameters is an OrderedDict containing parameters of the module ("parameters" as in nn.Parameters, not nn.Modules). That is the reason why it is empty in your example. Have a look at the following module instead:
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.p = nn.Parameter(torch.rand(10))
def forward(self, x):
return self.fc(x)
>>> model = MyModel()
>>> print(model._parameters)
OrderedDict([('p', Parameter containing:
tensor([8.5576e-01, 1.4343e-01, 3.2866e-04, 9.4876e-01, 4.4837e-01, 9.7725e-02,
2.7249e-01, 6.7258e-01, 5.6823e-01, 4.0484e-01], requires_grad=True))])
I understand that using internal methods are frowned upon.
Do not use _parameters. You should instead use the appropriate API for this use case, which is nn.Module.parameters:
for p in model.parameters():
print(p)
Each linear layer in the MyModel class is a module, and can be accessed using _modules. So you can access the parameters of your model using:
for module_key in model._modules.keys():
for param_key in model._modules[module_key]._parameters:
p = model._modules[module_key]._parameters[param_key]
I would like to save model weights to mlflow tracking using pytorch-lightning.
pytorch-lightning supports logging.
However, it seems that saving model weights as a artifact on mlflow is not supported.
At first, I planed to override ModelCheckpoint class to do it, but I found it is difficult for me because of complex Mixin operations.
Anybody knows simple way to accomplish it?
As #xela said, you can use the experiment object of the mlflow logger to log artifacts.
In case you want to frequently log model weights during training, you could extend ModelCheckpoint:
class MLFlowModelCheckpoint(ModelCheckpoint):
def __init__(self, mlflow_logger, *args, **kwargs):
super().__init__(*args, **kwargs)
self.mlflow_logger = mlflow_logger
#rank_zero_only
def on_validation_end(self, trainer, pl_module):
super().on_validation_end(trainer, pl_module)
run_id = self.mlflow_logger.run_id
self.mlflow_logger.experiment.log_artifact(run_id, self.best_model_path)
And then use in your training code
mlflow_logger = MLFlowLogger()
checkpoint_callback = MLFlowModelCheckpoint(mlflow_logger)
trainer = pl.Trainer(checkpoint_callback=checkpoint_callback, logger=mlflow_logger)
An alternative to #stecklin is to use the loggers' after_save_checkpoint method.
You could extend MLFlowLogger like this:
class MLFlowLoggerCheckpointer(pl.MLFlowLogger):
def __init__(self, *args, **kwargs) -> None:
super().__init__(*args, **kwargs)
def after_save_checkpoint(self, model_checkpoint: pl.ModelCheckpoint) -> None:
"""
Called after model checkpoint callback saves a new checkpoint.
"""
self.experiment.log_artifact(
self.run_id, model_checkpoint.best_model_path
)
Save it directly in the experiment object?
https://pytorch-lightning.readthedocs.io/en/0.7.1/loggers.html#using-loggers
I'm trying to develop a layer in Keras which works with 3D tensors. To make it flexible, I would like to postpone the code that relies on the input's exact shape as much as possible.
My layer is overriding 5 methods:
from tensorflow.python.keras.layers import Layer
class MyLayer(Layer):
def __init__(self, **kwargs):
pass
def build(self, input_shape):
pass
def call(self, inputs, verbose=False):
second_dim = K.int_shape(inputs)[-2]
# Do something with the second_dim
def compute_output_shape(self, input_shape):
pass
def get_config(self):
pass
And I'm using this layer like this:
input = Input(batch_shape=(None, None, 128), name='input')
x = MyLayer(name='my_layer')(input)
model = Model(input, x)
But I'm facing an error since the second_dim is None. How can I develop a layer that relies on the dimensions of the input but it's ok with it being provided by the actual data and not the input layer?
I ended up asking the same question differently, and I've got a perfect answer:
What is the right way to manipulate the shape of a tensor when there are unknown elements in it?
The gist of it is, don't treat the dimensions directly. Use them by reference and not by value. So, do not use K.int_shape and instead use K.shape. And use Keras operations to compose and come up with a new shape:
shape = K.shape(x)
newShape = K.concatenate([
shape[0:1],
shape[1:2] * shape[2:3],
shape[3:4]
])
I was following this method
(https://discuss.pytorch.org/t/dynamic-parameter-declaration-in-forward-function/427) to dynamically assign parameters in forward function.
However, my parameter is not just one single weight tensor but it is nn.Sequential.
When I implement below:
class MyModule(nn.Module):
def __init__(self):
# you need to register the parameter names earlier
self.register_parameter('W_di', None)
def forward(self, input):
if self.W_di is None:
self.W_di = nn.Sequential(
nn.Linear(mL_n * 2, 1024),
nn.ReLU(),
nn.Linear(1024, self.hS)).to(device)
I get the following error.
TypeError: cannot assign 'torch.nn.modules.container.Sequential' as parameter 'W_di' (torch.nn.Parameter or None expected)
Is there any way that I can register nn.Sequential as a whole param? Thanks!
If you or other users still have this problem, one solution to consider is using nn.ModuleList instead of nn.Sequential.
While nn.Sequential is useful for defining a fixed sequence of layers in PyTorch, nn.ModuleList is a more flexible container that allows direct access and modification of individual layers within the list. This can be especially helpful when dealing with dynamic models or architectures that require more complex layer arrangements.
My gut feeling is that you cannot do it. Even in the static model declaration, nn.Module also specifies the parameters of every sub-modules (e.g., nn.Conv2d or nn.Linear) in a nested way. That is, every kernel or bias is registered one by one and independently.
One workaround might be to introduce dynamic sub-modules. Here is my brief implementation. One can define desired dynamic behaviors inside the function DynamicLinear.
import torch
import torch.nn as nn
class DynamicLinear(nn.Module):
def __init__(self):
super(DynamicLinear, self).__init__()
# you need to register the parameter names earlier
self.register_parameter('W_di', None)
def forward(self, x):
if self.W_di is None:
# dynamically define a linear function here
self.W_di = nn.Parameter(torch.ones(1, 1)).to(x.device)
return self.W_di # x
class MyModule(nn.Module):
def __init__(self):
super(MyModule, self).__init__()
self.net = nn.Sequential(
DynamicLinear(),
nn.ReLU(),
DynamicLinear())
def forward(self, x):
return self.net(x)
m = MyModule()
x = torch.ones(1, 1)
y = m(x)
# output: 1
print(y)