I try to a KerasClassifier (wrapper) into final_model.h5
validator = GridSearchCV(estimator=clf, param_grid=param_grid)
grid_result = validator.fit(train_images, train_labels)
best_estimator = grid_result.best_estimator_
And then I want to reuse the model
from keras.models import load_model
loaded_model = load_model("final_model.h5")
But it seems like loaded_model is now a Sequential object instead. In other words it is different from KerasClassifier object like best_estimator
I want to reuse some method like score which is available in KerasClassifier, which is not available in Sequential model. What should I do?
Also, I would like to know more about how to continue the training process left off on final_model.h5. What can I do next?

Yes, in the end you saved the Keras model as HDF5, not the KerasClassifier that is just an adapter to use with scikit-learn.
But you don't really need the KerasClassifier instance, you want the score function and this in keras is called evaluate, so just call model.evaluate(X, Y) and this will return a list containing first the loss and then any metrics that your model used (most likely accuracy).
To continue training the model, just load it and call model.fit with the new training set and that's it.


Loading a GPU trained BERTopic model on CPU?

I trained a BERTopic model on a GPU, and now for visualization purposes I want to load it on a CPU.
But when I tried to do that I got:
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
When I tried to use the suggested fix I got the same problem?
Saw some fix that suggests to save the model without its embeddings model, but don't want to retrain an resave unless its the last option, and would also love if someone could explain what's this embedding model and what's going on under the hood.
topic_model = torch.load(args.model, map_location=torch.device('cpu'))
When you want to save the BERTopic model without the embedding model, you can run the following:
from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
from sentence_transformers import SentenceTransformer
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
# Train the model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
topic_model = BERTopic(embedding_model=embedding_model)
topics, probs = topic_model.fit_transform(docs)
# Save the model without the embedding model
topic_model.save("my_model", save_embedding_model=False)
This should prevent any issues with GPU/CPU if you are not using any of the cuML sub-models in BERTopic.
The embedding model is typically a pre-trained model that actually is not learning from the input data. There are options to make it learn during training but that requires a custom component in BERTopic. In other words, when you use a pre-trained model, it is no problem removing that pre-trained model when saving the topic model as there would be no need to re-train the model.
In other words, we would first save our topic model in our GPU environment without the embedding model:
topic_model.save("my_model", save_embedding_model=False)
Then, we load in our saved BERTopic model in our CPU environment and then pass the pre-trained embedding model:
from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
topic_model = BERTopic.load("my_model", embedding_model=embedding_model )
You can learn more about the role of the embedding model here.

Fitting a Gensim Fasttext pretrained model to my text

I have a pretrained fast text model, I have loaded it into my notebook and want to fit it to my free form text to train a ML classifier.
import pandas as pd
from sklearn.model_selection import train_test_split
from gensim.models import FastText
import pickle
import numpy as np
from numpy.linalg import norm
from gensim.utils import tokenize
model_2 = FastText.load(model_path + 'itsm_fasttext_embeddings_100_dim.model')
tokens = list()
def get_column_vector(model, list_corpus):
for i in list_corpus:
svec = np.zeros(100)
tok_sent = list(tokenize(i))
count = 0
for word in tok_sent:
vec = model.wv[word]
norm_vec = norm(vec)
if (norm_vec > 0):
vec = np.multiply(vec, (1/norm_vec))
svec = np.add(svec, vec)
count += 1
if (count > 0):
averaged_vec = np.multiply(svec, (1/count))
return tokens
list_corpus = df["freeformtext_col"].tolist()
# lst = array of vectors for each row of free form text
lst = get_column_vector(model, list_corpus)
x_text_train, x_text_test, y_train, y_test = train_test_split(lst, y, test_size=0.2, random_state=42)
model_2.fit(x_text_train, y_train, validation_split=0.1, shuffle=True)
I get the error of
AttributeError Traceback (most recent call last)
Input In [59], in <cell line: 1>()
----> 1 model_2.fit(x_text_train, y_train, validation_split=0.1,
AttributeError: 'FastText' object has no attribute 'fit'
Other documentation showing the initial training of fasttext have the fit function.
I am having trouble finding documentation of others who have taken a pre-tained fasttext gensim model and fit it to their text data to ultimately use a classifier
The Gensim FastText implementation offers no .fit() method. (I also don't see any such method in Facebook's Python wrapper of its original C++ FastText implementation. Even in its supervised-classification mode, it has its own train_supervised() method rather than a scikit-learn-style fit() method.)
If you saw some online example using such a method, it must have been using a different FastText implementation - so you should consult the full details of that other example to see which library they were using.
I don't know of any good online examples showing how to 'fine-tune' a pretrained FastText model to a smaller set of new texts, much less any demonstrating benefits, gotchas, & rules-of-thumb for performing such an operation.
If you did see an online example suggesting such an approach, & demonstrating some benefits over other less-complicated approaches, then that source-of-inspiration would also be the model to follow - or to mention/link when trying to debug their approach. Without someone's full working examples as a guide/template, you're in improvised-innovation mode.
Note you don't have to start with someone else's pre-trained model. You can train your own FastText models with your own training texts – and for many domains, & tasks, this could work better than a generic model trained from public sources like Wikipedia texts or large web crawls.
And when you do, you have the option of simply using FastText in its base unsupervised mode – as a way to featurize text – then pass those FastText-modeled features to some other explicit classifier option (such as the many calssifiers in scikit-learn with .fit() methods).
FastText's own -supervised mode builds a different kind of model that combines the word-training with the classification-training. A general FastText language model you find online is unlikely to be a specific -supervised mode model, unless it is explicitly declared to be one. If it's a standard unsupervised model, there's no straightforward way to adapt it into a -supervised model. And if it is already a -supervised model, it will have already been trained for someone else's fixed set of known-labels.

tf.keras how to save ModelCheckPoint object

ModelCheckpoint can be used to save the best model based on a specific monitored metrics. So it obviously has information about the best metrics stored within its object. If you train on google colab for example, your instance can be killed without warning and you would lose this info after a long training session.
I tried to pickle the ModelCheckpoint object but got:
TypeError: can't pickle _thread.lock objects
Such that i can reuse this same object when I bring my notebook back. Is there a good way to do this? You can try to reproduce by:
chkpt_cb = tf.keras.callbacks.ModelCheckpoint('model.{epoch:02d}-{val_loss:.4f}.h5',
with open('chkpt_cb.pickle', 'w') as f:
pickle.dump(chkpt_cb, f, protocol=pickle.HIGHEST_PROTOCOL)
If callback object is not to be pickled (due to thread issue and not advisable), I can pickle this instead:
best = chkpt_cb.best
This stores the best monitored metrics that callback has seen, and it is a float, which you can pickle and reload next time, and then do this:
chkpt_cb.best = best # if chkpt_cb is a brand new object you create when colab killed your session.
This is my own setup:
# All paths should be on Google Drive, I omitted it here for simplicity.
chkpt_cb = tf.keras.callbacks.ModelCheckpoint(filepath='model.{epoch:02d}-{val_loss:.4f}.h5',
if os.path.exists('chkpt_cb.best.pickle'):
with open('chkpt_cb.best.pickle', 'rb') as f:
best = pickle.load(f)
chkpt_cb.best = best
def save_chkpt_cb():
with open('chkpt_cb.best.pickle', 'wb') as f:
pickle.dump(chkpt_cb.best, f, protocol=pickle.HIGHEST_PROTOCOL)
save_chkpt_cb_callback = tf.keras.callbacks.LambdaCallback(
on_epoch_end=lambda epoch, logs: save_chkpt_cb()
history = model.fit_generator(generator=train_data_gen,
callbacks=[chkpt_cb, save_chkpt_cb_callback])
So even when your colab session got killed, you can still retrieve the last best metrics and inform your new instance about it, and continue training as usual. This especially help when you re-compile a stateful optimizer and may cause a regression in the loss/metric and don't want to save those models for first few epochs.
I think you might be misunderstanding the intended usage of the ModelCheckpoint object. It is a callback that periodically gets called during training at a particular phase. The ModelCheckpoint callback in particular gets called after every epoch (if you keep the default period=1) and saves your model to disk in the filename you specify to the filepath argument. The model is saved in the same way described here. Then if you want to load that model later, you can do something like
from keras.models import load_model
model = load_model('my_model.h5')
Other answers on SO provide nice guidance and examples for continuing training from a saved model, for example: Loading a trained Keras model and continue training. Importantly, the saved H5 file stores everything about your model that is needed to continue training.
As suggested in the Keras documentation, you should not use pickle to serialize your model. Simply register the ModelCheckpoint callback with your 'fit' function:
chkpt_cb = tf.keras.callbacks.ModelCheckpoint('model.{epoch:02d}-{val_loss:.4f}.h5',
model.fit(x_train, y_train,
Your model will be saved in an H5 file named as you have it, with the epoch number and loss values automatically formated for you. For example, your saved file for the 5th epoch with loss 0.0023 would look like model.05-.0023.h5, and since you set save_best_only=True, the model will only be saved if your loss is better than the previously saved one so you don't pollute your directory with a bunch of unneeded model files.

Keras Init Sequential model layers by Model layers

I'm trying to build some app using Transfer Learning. I want to use Vgg16 so I've done sth like this:
vgg16_model = keras.applications.vgg16.VGG16() but I want to transfer layers from Vgg16 to my model.
model = Sequential(layers=vgg16_model.layers) (I've seen this here)
but it leads me to error
TypeError: The added layer must be an instance of class Layer. Found:
How can I init my Sequential model by vgg16 layers?
Thanks in advance.
Try this:
vgg = VGG16()
model = Sequential()
model.add(...) # add additional layers

Show model layout / design (with all connections) in Keras

I have major differences when testing a Keras LSTM model after I've trained it compared to when I load that trained model from a .h5 file (Accuracy of the first is always > 0.85 but of the later is always below < 0.2 i.e. a random guess).
However I checked the weights, they are identical and also the sparse layout Keras give me via plot_model is the same, but since this only retrieves a rough overview:
Is there away to show the full layout of a Keras model (especially node connections)?
If you're using tensorflow backend, apart from plot_model, you can also use keras.callbacks.TensorBoard callback to visualize the whole graph in tensorboard. Example:
callback = keras.callbacks.TensorBoard(log_dir='./graph',
model.fit(..., callbacks=[callback])
Then run tensorboard --logdir ./graph from the same directory.
This is a quick shortcut, but you can go even further with that.
For example, add tensorflow code to define (load) the model within custom tf.Graph instance, like this:
from keras.layers import LSTM
import tensorflow as tf
my_graph = tf.Graph()
with my_graph.as_default():
# All ops / variables in the LSTM layer are created as part of our graph
x = tf.placeholder(tf.float32, shape=(None, 20, 64))
y = LSTM(32)(x)
.. after which you can list all graph nodes with dependencies, evaluate any variable, display the graph topology and so on, to compare the models.
I personally think, the simplest way is to setup your own session. It works in all cases with minimal patching:
import tensorflow as tf
from keras import backend as K
sess = tf.Session()
# Now can evaluate / access any node in this session, e.g. `sess.graph`
