How do I replace keras.utils.multi_gpu_model for training with multiple gpus - tensorflow2.x

As of tensorflow 2.4 tensorflow.keras.utils.multi_gpu_model has been removed. I am looking for a way to replace this simple command to train with multiple gpus.
from tensorflow.keras.models import load_model
model = load_model("my_model.h5")
if gpus>1:
from tensorflow.keras.utils import multi_gpu_model
model = multi_gpu_model(model, gpus=gpus)
Where model is a loaded model that can be used to train or make predictions on multiple gpus.

One way to train with multiple gpus is to use a distributed strategy. The way I found that works pretty much as a drop in replacement is a MirroredStrategy
session = MirroredStrategy()
with session.scope():
model = load_model("my_model.h5")
This way when the model is used, inside of this block, it is used on multiple gpus.

Related

Loading a GPU trained BERTopic model on CPU?

I trained a BERTopic model on a GPU, and now for visualization purposes I want to load it on a CPU.
But when I tried to do that I got:
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
When I tried to use the suggested fix I got the same problem?
Saw some fix that suggests to save the model without its embeddings model, but don't want to retrain an resave unless its the last option, and would also love if someone could explain what's this embedding model and what's going on under the hood.
topic_model = torch.load(args.model, map_location=torch.device('cpu'))
When you want to save the BERTopic model without the embedding model, you can run the following:
from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
from sentence_transformers import SentenceTransformer
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
# Train the model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
topic_model = BERTopic(embedding_model=embedding_model)
topics, probs = topic_model.fit_transform(docs)
# Save the model without the embedding model
topic_model.save("my_model", save_embedding_model=False)
This should prevent any issues with GPU/CPU if you are not using any of the cuML sub-models in BERTopic.
Saw some fix that suggests to save the model without its embeddings model, but don't want to retrain an resave unless its the last option, and would also love if someone could explain what's this embedding model and what's going on under the hood.
The embedding model is typically a pre-trained model that actually is not learning from the input data. There are options to make it learn during training but that requires a custom component in BERTopic. In other words, when you use a pre-trained model, it is no problem removing that pre-trained model when saving the topic model as there would be no need to re-train the model.
In other words, we would first save our topic model in our GPU environment without the embedding model:
topic_model.save("my_model", save_embedding_model=False)
Then, we load in our saved BERTopic model in our CPU environment and then pass the pre-trained embedding model:
from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
topic_model = BERTopic.load("my_model", embedding_model=embedding_model )
You can learn more about the role of the embedding model here.

What is the different for torchvision.models.resnet and torch.hub.load?

There are two method for using resnet of pytorch.
methods 1:
import torch
model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', pretrained=True)
model.eval()
methods 2:
import torch
net = models.resnet50(pretrained=True)
Are they load the same model. If not what is the difference?
The only difference that there is between your models if you load them in that way it's the number of layers, since you're loading resnet18 with Torch Hub and resnet50 with Models (thus, also the pretrained weights). They behave differently, you can see more about that in this paper.
Torch Hub also lets you publish pretrained models in your repository, but since you're loading it from 'pytorch/vision:v0.10.0' (which is the same repository from which Models is loading the neural networks), there should be no difference between:
model = torch.hub.load('pytorch/vision', 'resnet18', pretrained=True)
and
model = models.resnet18(pretrained=True)

Are PyTorch-trained models transferable between GPUs and TPUs?

After using a GPU for some to train a PyTorch model, can I use the saved weights to continue training my model on a TPU?
After using GPU for some time can I use the saved weights to train my model using TPU?
Yes, if you saved your GPU-trained model with, say
torch.save(model.save_dict(), 'model.pt')
you can load it again for use on a TPU (using https://github.com/pytorch/xla) in a separate program run with
import torch_xla.utils.serialization as xser
model.load_state_dict(xser.load('model.pt'))

Using pretrained models in Pytorch for Semantic Segmentation, then training only the fully connected layers with our own dataset

I am learning Pytorch and trying to understand how the library works for semantic segmentation.
What I've understood so far is that we can use a pre-trained model in pytorch. I've found an article which was using this model in the .eval() mode but I have not been able to find any tutorial on using such a model for training on our own dataset. I have a very small dataset and I need transfer learning to get results. My goal is to only train the FC layers with my own data. How is that achievable in Pytorch without complicating the code with OOP or so many .py files. I have been having a hard time figuring out such repos in github as I am not the most proficient person when it comes to OOP. I have been using Keras for Deep Learning until recently and there everything is easy and straightforward. Do I have the same options in Pycharm?
I appreciate any guidance on this. I need to run a piece of code that does the semantic segmentation and I am really confused about many of the steps I need to take.
Assume you start with a pretrained model called model. All of this occurs before you pass the model any data.
You want to find the layers you want to train by looking at all of them and then indexing them using model.children(). Running this command will show you all of the blocks and layers.
list(model.children())
Suppose you have now found the layers that you want to finetune (your FC layers as you describe). If the layers you want to train are the last 5 you can grab all of the layers except for the last 5 in order to set their requires_grad params to False so they don't train when you run the training algorithm.
list(model.children())[-5:]
Remove those layers:
layer_list = list(model.children())[-5:]
Rebuild model using sequential:
model_small = nn.Sequential(*list(model.children())[:-5])
Set requires_grad params to False:
for param in model_small.parameters():
param.requires_grad = False
Now you have a model called model_small that has all of the layers except the layers you want to train. Now you can reattach the layers that your removed and they will intrinsically have the requires_grad param set to True. Now when you train the model it will only update the weights on those layers.
model_small.avgpool_1 = nn.AdaptiveAvgPool2d()
model_small.lin1 = nn.Linear()
model_small.logits = nn.Linear()
model_small.softmax = nn.Softmax()
model = model_small.to(device)

Show model layout / design (with all connections) in Keras

I have major differences when testing a Keras LSTM model after I've trained it compared to when I load that trained model from a .h5 file (Accuracy of the first is always > 0.85 but of the later is always below < 0.2 i.e. a random guess).
However I checked the weights, they are identical and also the sparse layout Keras give me via plot_model is the same, but since this only retrieves a rough overview:
Is there away to show the full layout of a Keras model (especially node connections)?
If you're using tensorflow backend, apart from plot_model, you can also use keras.callbacks.TensorBoard callback to visualize the whole graph in tensorboard. Example:
callback = keras.callbacks.TensorBoard(log_dir='./graph',
histogram_freq=0,
write_graph=True,
write_images=True)
model.fit(..., callbacks=[callback])
Then run tensorboard --logdir ./graph from the same directory.
This is a quick shortcut, but you can go even further with that.
For example, add tensorflow code to define (load) the model within custom tf.Graph instance, like this:
from keras.layers import LSTM
import tensorflow as tf
my_graph = tf.Graph()
with my_graph.as_default():
# All ops / variables in the LSTM layer are created as part of our graph
x = tf.placeholder(tf.float32, shape=(None, 20, 64))
y = LSTM(32)(x)
.. after which you can list all graph nodes with dependencies, evaluate any variable, display the graph topology and so on, to compare the models.
I personally think, the simplest way is to setup your own session. It works in all cases with minimal patching:
import tensorflow as tf
from keras import backend as K
sess = tf.Session()
K.set_session(sess)
...
# Now can evaluate / access any node in this session, e.g. `sess.graph`

Resources