Run Pytorch stacked model on Colab TPU - pytorch

I am trying to run this my model on Colab Multi core TPU but I really don't know how to do it. I tried this tutorial notebook but I got some error and I can't fix it but I think there is maybe simpler wait for to do it.
About my model:
class BERTModel(nn.Module):
def __init__(self,...):
super().__init__()
if ...:
self.bert_model = XLMRobertaModel.from_pretrained(...) # huggingface XLM-R
elif ...:
self.bert_model = others_model.from_pretrained(...) # huggingface XLM-R
... # some other model's parameters
def forward(self,...):
bert_input = ...
output = self.bert_model(bert_input)
... # some function that process on output
def other_function(self,...):
# just doing some process on output. like concat layers's embedding and return ...
class MAINModel(nn.Module):
def __init__(self,...):
super().__init__()
print('Using model 1')
self.bert_model_1 = BERTModel(...)
print('Using model 2')
self.bert_model_2 = BERTModel(...)
self.linear = nn.Linear(...)
def forward(self,...):
bert_input = ...
bert_output = self.bert_model(bert_input)
linear_output = self.linear(bert_output)
return linear_output
Can you please tell me how to run a model like my model on Colab TPU? I used Colab PRO to make sure Ram memory is not a big problem. Thanks you so so much.

I would work off the examples here: https://github.com/pytorch/xla/tree/master/contrib/colab
Maybe start with a simpler model like this: https://github.com/pytorch/xla/blob/master/contrib/colab/mnist-training.ipynb
In the pseudocode you shared, there is no reference to the torch_xla library, which is required to use PyTorch on TPUs. I'd recommend starting with on of the working Colab notebooks in that directory I shared and then swapping out parts of the model with your own model. There are a few (usually like 3-4) places in the overall training code you need to modify for a model that runs on GPUs using native PyTorch if you want to run that model on TPUs. See here for a description of some of the changes. The other big change is to wrap the default dataloader with a ParallelLoader as shown in the example MNIST colab I shared
If you have any specific error you see in one of the Colabs, feel free to open an issue : https://github.com/pytorch/xla/issues

Related

ValueError: Tensor Tensor("dense_4/Sigmoid:0", shape=(?, 1025), dtype=float32) is not an element of this graph

Today I suddenly started getting this error for no apparent reason, while I was running model.fit(). This used to work before, I am using TF 2.3.0, more specifically its Keras module.
The function is called on validation inside a generator, which is fed into model.predict().
Basically, I load a checkpoint, I resume training the network, and I make a prediction on validation.
The error keeps occurring even when training a model from scratch, and erasing all the related data. It's like if something has been hardcoded, somewhere, as I was able to run model.fit() up until a few hours ago.
I saw several solutions like THIS, but none of these variations really work for me, as they lead to more tricky error messages.
I even tried installing a different version of TF, thinking that this was due to some old version, but the error still occurs.
I will answer my own question, as this one was particularly tricky and none of the solutions I found on the internet has worked for me, probably because outdated.
I'll write down just the relevant part to add in the code, feel free to add more technical explanations.
I like using args for passing variables, but it can work without:
from tensorflow.python.keras.backend import set_session
from tensorflow.keras.models import load_model
import generator # custom generator
def main(args):
# open new session and define TF graph
args.sess = tf.compat.v1.Session()
args.graph = tf.compat.v1.get_default_graph()
set_session(args.sess)
# define training generator
train_generator = generator(args.train_data)
# load model
args.model = load_model(args.model_path)
args.model.fit(train_generator)
Then, in the model prediction function:
# In my specific case, the predict_output() function is
# called inside the generator function
def predict_output(args, x):
with args.graph.as_default():
set_session(args.sess)
y = model.predict(x)
return y

Serve online learning models with mlflow

It is not clear to me if one could use mlflow to serve a model that is evolving continuously based on its previous predictions.
I need to be able to query a model in order to make a prediction on a sample of data which is the basic use of mlflow serve. However I also want the model to be updated internaly now that it has seen new data.
Is it possible or does it need a FR ?
I think that you should be able to do that by implementing the custom python model or custom flavor, as it's described in the documentation. In this case you need to create a class that is inherited from mlflow.pyfunc.PythonModel, and implement the predict method, and inside that method you're free to do anything. Here is just simple example from documentation:
class AddN(mlflow.pyfunc.PythonModel):
def __init__(self, n):
self.n = n
def predict(self, context, model_input):
return model_input.apply(lambda column: column + self.n)
and this model is then could be saved & loaded again just as normal models:
# Construct and save the model
model_path = "add_n_model"
add5_model = AddN(n=5)
mlflow.pyfunc.save_model(path=model_path, python_model=add5_model)
# Load the model in `python_function` format
loaded_model = mlflow.pyfunc.load_model(model_path)

PyTorch - Save just the model structure without weights and then load and train it

I want to separate model structure authoring and training. The model author designs the model structure, saves the untrained model to a file and then sends it training service which loads the model structure and trains the model.
Keras has the ability to save the model config and then load it.
How can the same be accomplished with PyTorch?
You can write your own function to do that in PyTorch. Saving of weights is straight forward where you simply do a torch.save(model.state_dict(), 'weightsAndBiases.pth').
For saving the model structure, you can do this:
(Assume you have a model class named Network, and you instantiate yourModel = Network())
model_structure = {'input_size': 784,
'output_size': 10,
'hidden_layers': [each.out_features for each in yourModel.hidden_layers],
'state_dict': yourModel.state_dict() #if you want to save the weights
}
torch.save(model_structure, 'model_structure.pth')
Similarly, we can write a function to load the structure.
def load_structure(filepath):
structure = torch.load(filepath)
model = Network(structure['input_size'],
structure['output_size'],
structure['hidden_layers'])
# model.load_state_dict(structure['state_dict']) if you had saved weights as well
return model
model = load_structure('model_structure.pth')
print(model)
Edit:
Okay, the above was the case when you had access to source code for your class, or if the class was relatively simple so you could define a generic class like this:
class Network(nn.Module):
def __init__(self, input_size, output_size, hidden_layers, drop_p=0.5):
''' Builds a feedforward network with arbitrary hidden layers.
Arguments
---------
input_size: integer, size of the input layer
output_size: integer, size of the output layer
hidden_layers: list of integers, the sizes of the hidden layers
'''
super().__init__()
# Input to a hidden layer
self.hidden_layers = nn.ModuleList([nn.Linear(input_size, hidden_layers[0])])
# Add a variable number of more hidden layers
layer_sizes = zip(hidden_layers[:-1], hidden_layers[1:])
self.hidden_layers.extend([nn.Linear(h1, h2) for h1, h2 in layer_sizes])
self.output = nn.Linear(hidden_layers[-1], output_size)
self.dropout = nn.Dropout(p=drop_p)
def forward(self, x):
''' Forward pass through the network, returns the output logits '''
for each in self.hidden_layers:
x = F.relu(each(x))
x = self.dropout(x)
x = self.output(x)
return F.log_softmax(x, dim=1)
However, that will only work for simple cases so I suppose that's not what you intended.
One option is, you can define the architecture of model in a separate .py file and import it along with other necessities(if the model architecture is complex) or you can altogether define the model then and there.
Another option is converting your pytorch model to onxx and saving it.
The other option is that, in Tensorflow you can create a .pb file that defines both the architecture and the weights of the model and in Pytorch you would do something like that this way:
torch.save(model, filepath)
This will save the model object itself, as torch.save() is just a pickle-based save at the end of the day.
model = torch.load(filepath)
This however has limitations, your model class definition might not for example be picklable(possible in some complicated models).
Because this is a such an iffy workaround, the answer that you'll usually get is - No, you have to declare the class definition before loading the trained model, ie you need to have access to the model class source code.
Side notes:
An official answer by one of the core PyTorch devs on limitations of loading a pytorch model without code:
We only save the source code of the class definition. We do not save beyond that (like the package sources that the class is referring to).
import foo
class MyModel(...):
def forward(input):
foo.bar(input)
Here the package foo is not saved in the model checkpoint.
There are limitations on robustly serializing python constructs. For example the default picklers cannot serialize lambdas. There are helper packages that can serialize more python constructs than the standard, but they still have limitations. Dill 25 is one such package.
Given these limitations, there is no robust way to have torch.load work without having the original source files.

example of doing simple prediction with pytorch-lightning

I have an existing model where I load some pre-trained weights and then do prediction (one image at a time) in pytorch. I am trying to basically convert it to a pytorch lightning module and am confused about a few things.
So currently, my __init__ method for the model looks like this:
self._load_config_file(cfg_file)
# just creates the pytorch network
self.create_network()
self.load_weights(weights_file)
self.cuda(device=0) # assumes GPU and uses one. This is probably suboptimal
self.eval() # prediction mode
What I can gather from the lightning docs, I can pretty much do the same, except not to do the cuda() call. So something like:
self.create_network()
self.load_weights(weights_file)
self.freeze() # prediction mode
So, my first question is whether this is the correct way to use lightning? How would lightning know if it needs to use the GPU? I am guessing this needs to be specified somewhere.
Now, for the prediction, I have the following setup:
def infer(frame):
img = transform(frame) # apply some transformation to the input
img = torch.from_numpy(img).float().unsqueeze(0).cuda(device=0)
with torch.no_grad():
output = self.__call__(Variable(img)).data.cpu().numpy()
return output
This is the bit that has me confused. Which functions do I need to override to make a lightning compatible prediction?
Also, at the moment, the input comes as a numpy array. Is that something that would be possible from the lightning module or do things always have to use some sort of a dataloader?
At some point, I want to extend this model implementation to do training as well, so want to make sure I do it right but while most examples focus on training models, a simple example of just doing prediction at production time on a single image/data point might be useful.
I am using 0.7.5 with pytorch 1.4.0 on GPU with cuda 10.1
LightningModule is a subclass of torch.nn.Module so the same model class will work for both inference and training. For that reason, you should probably call the cuda() and eval() methods outside of __init__.
Since it's just a nn.Module under the hood, once you've loaded your weights you don't need to override any methods to perform inference, simply call the model instance. Here's a toy example you can use:
import torchvision.models as models
from pytorch_lightning.core import LightningModule
class MyModel(LightningModule):
def __init__(self):
super().__init__()
self.resnet = models.resnet18(pretrained=True, progress=False)
def forward(self, x):
return self.resnet(x)
model = MyModel().eval().cuda(device=0)
And then to actually run inference you don't need a method, just do something like:
for frame in video:
img = transform(frame)
img = torch.from_numpy(img).float().unsqueeze(0).cuda(0)
output = model(img).data.cpu().numpy()
# Do something with the output
The main benefit of PyTorchLighting is that you can also use the same class for training by implementing training_step(), configure_optimizers() and train_dataloader() on that class. You can find a simple example of that in the PyTorchLightning docs.
Even though above answer suffices, if one takes note of following line
img = torch.from_numpy(img).float().unsqueeze(0).cuda(0)
One has to put both the model as well as image to the right GPU. On multi-gpu inference machine, this becomes a hassle.
To solve this, .predict was also recently produced, see more at https://pytorch-lightning.readthedocs.io/en/stable/deploy/production_basic.html

How to run predictions on image using a pretrained tensorflow model?

I have adapted this retrain.py script to use with several pretraineds model,
after training is done this generates a 'retrained_graph.pb' which I then read and try to use to run predictions on an image using this code:
def get_top_labels(image_data):
'''
Returns a list of labels and their probabilities
image_data: content of image as string
'''
with tf.compat.v1.Session() as sess:
softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
predictions = sess.run(softmax_tensor, {'DecodeJpeg/contents:0': image_data})
return predictions
This works fine for inception_v3 model because it has a tensor called 'DecodeJpeg', other models I'm using such as inception_v4, mobilenet and inception_resnet_v2 don't.
My question is can I add an ops to the graph, like the one used in add_jpeg_decoding in the retrain.py script so that I can afterwards use that for prediction ?
Would it be possible to do something like this:
predictions = sess.run(softmax_tensor, {image_data_tensor: image_data}) where image_data_tensor is a variable that depends on what model I'm using ?
I looked through stackoverflow and couldn't find a question that solves my problem, I'd really appreciate any help with this, thanks.
I need to at least know if it's possible.
Sorry for repost I got no views on my first one.
So after some research, I figured out a way, leaving an answer here in case someone needs it. What you need to do is do the decoding yourself get a tensor from the image using t = read_tensor_from_image_file found here, then you can run your predictions using this piece of code:
start = time.time()
results = sess.run(output_layer_name,
{input_layer_name: t})
end = time.time()
return results
usually input_layer_name = input:0 and output_layer_name = final_result:0.

Resources