Build a pytorch model wrap around another pytorch model - pytorch

Is it possible to wrap a pytorch model inside another pytorch module? I could not do it the normal way like in transfer learning (simply concatenating some more layers) because in order to get the intended value for the next 'layer', I need to wait the last layer of the first module to generate multiple outputs (say 100) and to use all those outputs to get the value for the next 'layer' (say taking the max of those outputs). I tried to define the integrated model as something like the following:
class integrated(nn.Module):
def __init__(self):
super(integrated, self)._init_()
def forward(self, x):
model = VAE(
num_labels=10 if args.conditional else 0).to(device)
device = torch.device('cpu')
model.load_state_dict(torch.load(r'...')) # the first model is saved somewhere else beforehand
temp = []
for j in range(100):
x = model(x)
return y
The reason I would like to do that is the library I need to use requires the input itself to be a pytorch module. Otherwise I could simply leave the last part outside of the module.

Yes you can definitely use a Pytorch module inside another Pytorch module. The way you are doing this in your example code is a bit unusual though, as external modules (VAE, in your case) are more often initialized in the __init__ function and then saved as attributes of the main module (integrated). Among other things, this avoids having to reload the sub-module every time you call forward.
One other thing that looks a bit funny is your for loop over repeated invocations of model(x). If there is no randomness involved in model's evaluation, then you would only need a single call to model(x), since all 100 calls will give the same value. So assuming there is some randomness, you should consider whether you can get the desired effect by batching together 100 copies of x and using a single call to model with this batched input. This ultimately depends on additional information about why you are calling this function multiple times on the same input, but either way, using a single batched evaluation will be a lot faster than using many unbatched evaluations.


example of doing simple prediction with pytorch-lightning

I have an existing model where I load some pre-trained weights and then do prediction (one image at a time) in pytorch. I am trying to basically convert it to a pytorch lightning module and am confused about a few things.
So currently, my __init__ method for the model looks like this:
# just creates the pytorch network
self.cuda(device=0) # assumes GPU and uses one. This is probably suboptimal
self.eval() # prediction mode
What I can gather from the lightning docs, I can pretty much do the same, except not to do the cuda() call. So something like:
self.freeze() # prediction mode
So, my first question is whether this is the correct way to use lightning? How would lightning know if it needs to use the GPU? I am guessing this needs to be specified somewhere.
Now, for the prediction, I have the following setup:
def infer(frame):
img = transform(frame) # apply some transformation to the input
img = torch.from_numpy(img).float().unsqueeze(0).cuda(device=0)
with torch.no_grad():
output = self.__call__(Variable(img)).data.cpu().numpy()
return output
This is the bit that has me confused. Which functions do I need to override to make a lightning compatible prediction?
Also, at the moment, the input comes as a numpy array. Is that something that would be possible from the lightning module or do things always have to use some sort of a dataloader?
At some point, I want to extend this model implementation to do training as well, so want to make sure I do it right but while most examples focus on training models, a simple example of just doing prediction at production time on a single image/data point might be useful.
I am using 0.7.5 with pytorch 1.4.0 on GPU with cuda 10.1
LightningModule is a subclass of torch.nn.Module so the same model class will work for both inference and training. For that reason, you should probably call the cuda() and eval() methods outside of __init__.
Since it's just a nn.Module under the hood, once you've loaded your weights you don't need to override any methods to perform inference, simply call the model instance. Here's a toy example you can use:
import torchvision.models as models
from pytorch_lightning.core import LightningModule
class MyModel(LightningModule):
def __init__(self):
self.resnet = models.resnet18(pretrained=True, progress=False)
def forward(self, x):
return self.resnet(x)
model = MyModel().eval().cuda(device=0)
And then to actually run inference you don't need a method, just do something like:
for frame in video:
img = transform(frame)
img = torch.from_numpy(img).float().unsqueeze(0).cuda(0)
output = model(img).data.cpu().numpy()
# Do something with the output
The main benefit of PyTorchLighting is that you can also use the same class for training by implementing training_step(), configure_optimizers() and train_dataloader() on that class. You can find a simple example of that in the PyTorchLightning docs.
Even though above answer suffices, if one takes note of following line
img = torch.from_numpy(img).float().unsqueeze(0).cuda(0)
One has to put both the model as well as image to the right GPU. On multi-gpu inference machine, this becomes a hassle.
To solve this, .predict was also recently produced, see more at

Understanding Input Sequences of Unlimited Length for RNNs in Keras

I have been looking into an implementation of a certain architecture of deep learning model in keras when I came across a technicality that I could not grasp. In the code, the model is implemented as having two inputs; the first is the normal input that goes through the graph (word_ids in the sample code below), while the second is the length of that input, which seems to be involved nowhere other than the inputs argument in the keras Model instant (sequence_lengths in the sample code below).
word_ids = Input(batch_shape=(None, None), dtype='int32')
word_embeddings = Embedding(input_dim=embeddings.shape[0],
x = Bidirectional(LSTM(units=64, return_sequences=True))(word_embeddings)
x = Dense(64, activation='tanh')(x)
x = Dense(10)(x)
sequence_lengths = Input(batch_shape=(None, 1), dtype='int32')
model = Model(inputs=[word_ids, sequence_lengths], outputs=[x])
I think this is done to make the network accept a sequence of any length. My questions are as follow:
Is what I think correct?
If yes, then, I feel like there is a bit of
magic going on under the hood. Any suggestions on how to wrap
one's head around this?
Does this mean that using this method, one doesn't need to pad his sequences (neither in training nor in prediction), and that keras will somehow know how to pad them automatically?
Do you need to pass sequence_lengths as an input?
No, it's absolutely not necessary to pass the sequence lengths as inputs, either if you're working with fixed or with variable length sequences.
I honestly don't understand why that model in the code uses this input if it's not sent to any of the model layers to be processed.
Is this really the complete model?
Why would one pass the sequence lengths as an input?
Well, maybe they want to perform some custom calculations with those. It might be an interesting option, but none of these calculations are present (or shown) in the code you posted. This model is doing absolutely nothing with this input.
How to work with variable sequence length?
For that, you've got two options:
Pad the sequences, as you mentioned, to a fixed size, and add Masking layers to the input (or use the mask_zeros=True option in the embedding layer).
Use the length dimension as None. This is done with one of these:
batch_shape=(batch_size, None)
PS: these shapes are for Embedding layers. An input that goes directly into recurrent networks would have an additional last dimension for input features
When using the second option (length = None), you should process each batch separately, because you are not able to put all sequences with different lengths in the same numpy array. But there is no limitation in the model itself, and no padding is necessary in this case.
How to work with "unlimited" length
The only way to work with unlimited length is using stateful=True.
In this case, every batch you pass will not be seen as "another group of sequences", but "additional steps of the previous batch".

Creating your own Keras Optimizer

When creating a customer Keras Optimizer, the workhorse function is Optimizer.get_updates(). I was able to create a fixed-step optimizer, but I am not sure how to do things such as running averages where I have to use values computed from previous calls of the function.
For instance, consider RMSprop. Isn't the accumulator being reset at each call of the function?
accumulators = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
self.weights = accumulators
How is RMSProp doing the running average when the accumulator is being reset at the beginning of each update call?
You are right that the accumulator is set to zero on each get_updates call. But this function is only called once, while the computational graph is built.
What is confusing is the use of symbolic functions. As Keras uses symbolic representations, what happens in get_updatesis that a symbolic update is generated, in line 237-238:
new_a = self.rho * a + (1. - self.rho) * K.square(g)
self.updates.append(K.update(a, new_a))
These updates are then used while performing gradient descent. Symbolically it says that when you call updates, as in updates to a shared variable, then a is set to the value of new_a which considers the previous value of a. This part does the running average accumulator.
Note that multiple updates are built, one for each parameter, and then these symbolic updates are collected in a list that is returned to the caller.

CUDA vs. DataParallel: Why the difference?

I have a simple neural network model and I apply either cuda() or DataParallel() on the model like following.
model = torch.nn.DataParallel(model).cuda()
model = model.cuda()
When I don't use DataParallel, rather simply transform my model to cuda(), I need to explicitly convert the batch inputs to cuda() and then give it to the model, otherwise it returns the following error.
torch.index_select received an invalid combination of arguments - got (torch.cuda.FloatTensor, int, torch.LongTensor)
But with DataParallel, the code works fine. Rest of the other things are same. Why this happens? Why when I use DataParallel, I don't need to transform the batch inputs explicitly to cuda()?
Because, DataParallel allows CPU inputs, as it's first step is to transfer inputs to appropriate GPUs.
Info source:

Tensorflow: Setting an array element with a sequence

I'm trying to train a CNN using my own image dataset, but when passing the batch data and label to the feed_dict I get the error ValueError: setting an array element with a sequence from what I read here, this is a dimension issue, and probably coming from my batch_label Tensor, but I couldn't figure out how to make it a one-hot Tensor (what my graph expects).
I uploaded the full code as a gist here:
TL;DR: You can't feed a tf.Tensor object (viz. batch_data and batch_labels in your gist) as the value for another tensor. (I believe the error message should be clearer about this in more recent versions of TensorFlow.)
Unfortunately you can't currently use the feed/tf.placeholder() mechanism to pass the result of one TensorFlow graph to another. We are investigating ways to make this easier, since it is a common confusion and feature request. For your exact program, it should be easy to solve this however. Simply move the lines that create the input and replace the placeholders with them. Your program will then look something like:
with graph.as_default():
# Input data.
filename_and_label_tensor = tf.train.string_input_producer(['train.txt'], shuffle=True)
data, label = parse_csv(filename_and_label_tensor)
tf_train_dataset, tf_train_labels = tf.train.batch([data, label], batch_size, num_threads=4)
# Rest of the model construction goes here....
Typically, if you want to pass another dataset through the same model—e.g. for evaluation—it's easiest to make another copy of the graph (perhaps sharing the same tf.Variable objects).
