Loading a GPU trained BERTopic model on CPU? - pytorch

I trained a BERTopic model on a GPU, and now for visualization purposes I want to load it on a CPU.
But when I tried to do that I got:
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
When I tried to use the suggested fix I got the same problem?
Saw some fix that suggests to save the model without its embeddings model, but don't want to retrain an resave unless its the last option, and would also love if someone could explain what's this embedding model and what's going on under the hood.
topic_model = torch.load(args.model, map_location=torch.device('cpu'))

When you want to save the BERTopic model without the embedding model, you can run the following:
from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
from sentence_transformers import SentenceTransformer
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
# Train the model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
topic_model = BERTopic(embedding_model=embedding_model)
topics, probs = topic_model.fit_transform(docs)
# Save the model without the embedding model
topic_model.save("my_model", save_embedding_model=False)
This should prevent any issues with GPU/CPU if you are not using any of the cuML sub-models in BERTopic.
Saw some fix that suggests to save the model without its embeddings model, but don't want to retrain an resave unless its the last option, and would also love if someone could explain what's this embedding model and what's going on under the hood.
The embedding model is typically a pre-trained model that actually is not learning from the input data. There are options to make it learn during training but that requires a custom component in BERTopic. In other words, when you use a pre-trained model, it is no problem removing that pre-trained model when saving the topic model as there would be no need to re-train the model.
In other words, we would first save our topic model in our GPU environment without the embedding model:
topic_model.save("my_model", save_embedding_model=False)
Then, we load in our saved BERTopic model in our CPU environment and then pass the pre-trained embedding model:
from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
topic_model = BERTopic.load("my_model", embedding_model=embedding_model )
You can learn more about the role of the embedding model here.

Related

Why some weights of GPT2Model are not initialized?

I am using the GPT2 pre-trained model for a research project and when I load the pre-trained model with the following code,
from transformers.models.gpt2.modeling_gpt2 import GPT2Model
gpt2 = GPT2Model.from_pretrained('gpt2')
I get the following warning message:
Some weights of GPT2Model were not initialized from the model checkpoint at gpt2 and are newly initialized: ['h.0.attn.masked_bias', 'h.1.attn.masked_bias', 'h.2.attn.masked_bias', 'h.3.attn.masked_bias', 'h.4.attn.masked_bias', 'h.5.attn.masked_bias', 'h.6.attn.masked_bias', 'h.7.attn.masked_bias', 'h.8.attn.masked_bias', 'h.9.attn.masked_bias', 'h.10.attn.masked_bias', 'h.11.attn.masked_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
From my understanding, it says that the weights of the above layers are not initialized from the pre-trained model. But we all know that attention layers ('attn') are so important in GPT2 and if we can not have their actual weights from the pre-trained model, then what is the point of using a pre-trained model?
I really appreciate it if someone could explain this to me and tell me how I can fix this.
The masked_bias was added but the huggingface community as a speed improvement compared to the original implementation. It should not negatively impact the performance as the original weights are loaded properly. Check this PR for further information.

Make sure BERT model does not load pretrained weights?

I want to make sure my BertModel does not loads pre-trained weights. I am using auto class (hugging face) which loads model automatically.
My question is how do I load bert model without pretrained weights?
Use AutoConfig instead of AutoModel:
from transformers import AutoConfig
config = AutoConfig.from_pretrained('bert-base-uncased')
model = AutoModel.from_config(config)
this should set up the model without loading the weights.
Documentation here and here
You can re-initialize of a PreTrainedModel class with init_weights method (Huggingface Documentation), if the model is already loaded with pre-trained weights.
Maybe you can just load the model with pretrained weights, iterate over model parameters and set the model parameters randomly with whatever initialization technique you prefer.

Compatibility between keras and tf.keras models

I am interested in training a model in tf.keras and then loading it with keras. I know this is not highly-advised, but I am interested in using tf.keras to train the model because
tf.keras is easier to build input pipelines
I want to take advantage of the tf.dataset API
and I am interested in loading it with keras because
I want to use coreml to deploy the model to ios.
I want to use coremltools to convert my model to ios, and coreml tools only works with keras, not tf.keras.
I have run into a few road-blocks, because not all of the tf.keras layers can be loaded as keras layers. For instance, I've had no trouble with a simple DNN, since all of the Dense layer parameters are the same between tf.keras and keras. However, I have had trouble with RNN layers, because tf.keras has an argument time_major that keras does not have. My RNN layers have time_major=False, which is the same behavior as keras, but keras sequential layers do not have this argument.
My solution right now is to save the tf.keras model in a json file (for the model structure) and delete the parts of the layers that keras does not support, and also save an h5 file (for the weights), like so:
model = # model trained with tf.keras
# save json
model_json = model.to_json()
with open('path_to_model_json.json', 'w') as json_file:
json_ = json.loads(model_json)
layers = json_['config']['layers']
for layer in layers:
if layer['class_name'] == 'SimpleRNN':
del layer['config']['time_major']
json.dump(json_, json_file)
# save weights
model.save_weights('path_to_my_weights.h5')
Then, I use the coremlconverter tool to convert from keras to coreml, like so:
with CustomObjectScope({'GlorotUniform': glorot_uniform()}):
coreml_model = coremltools.converters.keras.convert(
model=('path_to_model_json','path_to_my_weights.h5'),
input_names=#inputs,
output_names=#outputs,
class_labels = #labels,
custom_conversion_functions = { "GlorotUniform": tf.keras.initializers.glorot_uniform
}
)
coreml_model.save('my_core_ml_model.mlmodel')
My solution appears to be working, but I am wondering if there is a better approach? Or, is there imminent danger in this approach? For instance, is there a better way to convert tf.keras models to coreml? Or is there a better way to convert tf.keras models to keras? Or is there a better approach that I haven't thought of?
Any advice on the matter would be greatly appreciated :)
Your approach seems good to me!
In the past, when I had to convert tf.keras model to keras model, I did following:
Train model in tf.keras
Save only the weights tf_model.save_weights("tf_model.hdf5")
Make Keras model architecture using all layers in keras (same as the tf keras one)
load weights by layer names in keras: keras_model.load_weights(by_name=True)
This seemed to work for me. Since, I was using out of box architecture (DenseNet169), I had to very less work to replicate tf.keras network to keras.

LSTM model weights to train data for text classification

I built a LSTM model for text classification using Keras. Now I have new data to be trained. instead of appending to the original data and retrain the model, I thought of training the data using the model weights. i.e. making the weights to get trained with the new data.
However, irrespective of the volume i train, the model is not predicting the correct classification (even if i give the same sentence for prediction). What could be the reason?
Kindly help me.
Are you using the following to save the trained model?
model.save('model.h5')
model.save_weights('model_weights.h5')
And the following to load it?
from keras.models import load_model
model = load_model('model.h5') # Load the architecture
model = model.load_weights('model_weights.h5') # Set the weights
# train on new data
model.compile...
model.fit...
The model loaded is the exact same as the model being saved here. If you are doing this, then there must be something different in the data (in comparison with what it is trained on).

Train, Save and Load a Tensorflow Model

Refer to this to train a GAN model for MNIST dataset, I want to save a model and restore it for further prediction. After having some understanding of Saving and Importing a Tensorflow Model I am able to save and restore some variables of inputs and outputs but for this network I am able to save the model only after some specific iterations and not able to predict some output.
Did you refer to this guide? It explains very clearly how to load and save tensorflow models in all possible formats.
If you are new to ML, I'd recommend you give Keras a try first, which is much easier to use. See https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model, pretty much you can use:
model.save('my_model.h5')
to save your model to disk.
model = load_model('my_model.h5')
to load your model and make prediction

Resources