Load trained model on another machine - fastai, torch, huggingface - nlp

I am using fastai with pytorch to fine tune XLMRoberta from huggingface.
I've trained the model and everything is fine on the machine where I trained it.
But when I try to load the model on another machine I get OSError - Not Found - No such file or directory pointing to .cache/torch/transformers/. The issue is the path of a vocab_file.
I've used fastai's Learner.export to export the model in .pkl file, but I don't believe that issue is related to fastai since I found the same issue appearing in flairNLP.
It appears that the path to the cache folder, where the vocab_file is stored during the training, is embedded in the .pkl file:
The error comes from transformer's XLMRobertaTokenizer __setstate__:
def __setstate__(self, d):
self.__dict__ = d
self.sp_model = spm.SentencePieceProcessor()
self.sp_model.Load(self.vocab_file)
which tries to load the vocab_file using the path from the file.
I've tried patching this method using:
pretrained_model_name = "xlm-roberta-base"
vocab_file = XLMRobertaTokenizer.from_pretrained(pretrained_model_name).vocab_file
def _setstate(self, d):
self.__dict__ = d
self.sp_model = spm.SentencePieceProcessor()
self.sp_model.Load(vocab_file)
XLMRobertaTokenizer.__setstate__ = MethodType(_setstate, XLMRobertaTokenizer(vocab_file))
And that successfully loaded the model but caused other problems like missing model attributes and other unwanted issues.
Can someone please explain why is the path embedded inside the file, is there a way to configure it without reexporting the model or if it has to be reexported how to configure it dynamically using fastai, torch and huggingface.

I faced the same error. I had fine tuned XLMRoberta on downstream classification task with fastai version = 1.0.61. I'm loading the model inside docker.
I'm not sure about why the path is embedded, but I found a workaround. Posting for future readers who might be looking for workaround as retraining is usually not possible.
I created /home/.cache/torch/transformer/ inside the docker image.
RUN mkdir -p /home/<username>/.cache/torch/transformers
Copied the files (which were not found in docker) from my local /home/.cache/torch/transformer/ to docker image /home/.cache/torch/transformer/
COPY filename:/home/<username>/.cache/torch/transformers/filename

Related

Setting up Visual Studio Code to run models from Hugging Face

I am trying to import models from hugging face and use them in Visual Studio Code.
I installed transformers, tensorflow, and torch.
I have tried looking at multiple tutorials online but have found nothing.
I am trying to run the following code:
from transformers import pipeline
classifier = pipeline('sentiment-analysis')
result = classifier("I hate it when I'm sitting under a tree and an apple hits my head.")
print(result)
However, I get the following error:
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Traceback (most recent call last):
File "c:\Users\user\Desktop\Artificial Intelligence\transformers\Workshops\workshop_3.py", line 4, in <module>
classifier = pipeline('sentiment-analysis')
File "C:\Users\user\Desktop\Artificial Intelligence\transformers\src\transformers\pipelines\__init__.py", line 702, in pipeline
framework, model = infer_framework_load_model(
File "C:\Users\user\Desktop\Artificial Intelligence\transformers\src\transformers\pipelines\base.py", line 266, in infer_framework_load_model
raise ValueError(f"Could not load model {model} with any of the following classes: {class_tuple}.")
ValueError: Could not load model distilbert-base-uncased-finetuned-sst-2-english with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForSequenceClassification'>, <class 'transformers.models.auto.modeling_tf_auto.TFAutoModelForSequenceClassification'>, <class 'transformers.models.distilbert.modeling_distilbert.DistilBertForSequenceClassification'>, <class 'transformers.models.distilbert.modeling_tf_distilbert.TFDistilBertForSequenceClassification'>).
I have already searched online for ways to set up transformers to use in Visual Studio Code but nothing is helping.
Do you know how to fix this error, or if someone knows how to successfully use models from Hugging Face into my code, it would be appreciated?
This question is a little less about Hugging Face itself and likely more about installation and the installation steps you took (and potentially your program's access to the cache file where the models are automatically downloaded to.).
From what I am seeing either:
1/ your program is unable to access the model
2/ your program is throwing specific value errors in a bit of an edge case
If 1/ Take a look here: [https://huggingface.co/docs/transformers/installation#cache-setup][1]
Notice that it the docs walks through where the pre-trained models are downloaded. Check that it was downloaded here: C:\Users\username\.cache\huggingface\hub (of course with your own username on your computer instead. Check in the cache location to make sure it was downloaded? (You can check in the cache locations mentioned.)
Second, if for some reason, there is an issue with downloading, you can try downloading manually and doing it via offline mode (this is more to get it up and running): https://huggingface.co/docs/transformers/installation#offline-mode
Third, if it is downloaded, do you have the right permissions to access the .cache? (Try running your program (if it is a program that you trust) on Windows Terminal as an administrator.). Various ways - find one that you're comfortable with, here are a couple hints from Stackoverflow/StackExchange: Opening up Windows Terminal with elevated privileges, from within Windows Terminal or this: https://superuser.com/questions/1560049/open-windows-terminal-as-admin-with-winr
If 2/ I have seen people bring up very specific issues on not finding specific values (not the same as yours but similar) and the issue was solved by installing PyTorch because some models only exist as PyTorch models. You can see the full response from #YokoHono here: Transformers model from Hugging-Face throws error that specific classes couldn t be loaded

Unable to save keras model in databricks

I am saving keras model
model.save('model.h5')
in databricks, but model is not saving,
I have also tried saving in /tmp/model.h5 as mentioned here
but model is not saving.
The saving cell executes but when I load model it shows no model.h5 file is available.
when I do this dbfs_model_path = 'dbfs:/FileStore/models/model.h5' dbutils.fs.cp('file:/tmp/model.h5', dbfs_model_path)
OR try loading model
tf.keras.models.load_model("file:/tmp/model.h5")
I get error message java.io.FileNotFoundException: File file:/tmp/model.h5 does not exist
The problem is that Keras is designed to work only with local files, so it doesn't understand URIs, such as dbfs:/, or file:/. So you need to use local paths for saving & loading operations, and then copy files to/from DBFS (unfortunately /dbfs doesn't play well with Keras because of the way it works).
The following code works just fine. Note that dbfs:/ or file:/ are used only in the calls to the dbutils.fs commands - Keras stuff uses the names of local files.
create model & save locally as /tmp/model-full.h5:
from tensorflow.keras.applications import InceptionV3
model = InceptionV3(weights="imagenet")
model.save('/tmp/model-full.h5')
copy data to DBFS as dbfs:/tmp/model-full.h5 and check it:
dbutils.fs.cp("file:/tmp/model-full.h5", "dbfs:/tmp/model-full.h5")
display(dbutils.fs.ls("/tmp/model-full.h5"))
copy file from DBFS as /tmp/model-full2.h5 & load it:
dbutils.fs.cp("dbfs:/tmp/model-full.h5", "file:/tmp/model-full2.h5")
from tensorflow import keras
model2 = keras.models.load_model("/tmp/model-full2.h5")

Importing Tensorboard for maskRcnn(Matterport - Mask RCNN)

I am currently trying to implement Mask RCNN by following Matterport repo. I have a doubt regarding implementation of tensorboard.
The dataset is similar to coco dataset. Inside model.py under def train, tensorboard is mentioned as
callbacks = [ keras.callbacks.TensorBoard(log_dir=self.log_dir,histogram_freq=0, write_graph=True, write_images=False)
But What else I should mention for using tensorboard? When I try to run the tensorboard, it say log file not found. I know that there is something I am missing some where!!. Please help me out !
In your model.train() ensure you set custom_callbacks = callbacks parameter.
If you specified these parameters exactly like this, then it means that your issue is that you do not properly open the logs directory.
Open (inside Anaconda/PyCharm) or a separate Python terminal and put the absolute path(to make sure it works):
tensorboard --logdir = my_absolute_path/logs/

Loading pretrained FastAI models in Kaggle kernels without using internet

I am trying to load a densenet121 model in Kaggle kernel without switching on the internet.
I have done the required steps such as adding the pre-trained weights to my input directory and moving it to '.cache/torch/checkpoints/'. It still would not work and throws a gaierror.
The following the is code SNIPPET:
!mkdir -p /tmp/.cache/torch/checkpoints
!cp ../input/fastai-pretrained-models/densenet121-a639ec97.pth /tmp/.cache/torch/checkpoints/densenet121-a639ec97.pth
learn_cd = create_cnn(data_cd, models.densenet121, metrics=[error_rate, accuracy],model_dir = Path('../kaggle/working/models'),path=Path('.'),).to_fp16()
I have been struggling with this for a long time. Any help would be immensely helpful
so input path "../input/" in kaggle kernel is read only. create a folder in "kaggle/working" rather and copy the model weights there. Example below
if not os.path.exists('/root/.cache/torch/hub/checkpoints/'):
os.makedirs('/root/.cache/torch/hub/checkpoints/')
!mkdir '/kaggle/working/resnet34'
!cp '/root/.cache/torch/hub/checkpoints/resnet34-333f7ec4.pth' '/kaggle/working/resnet34/resnet34.pth'

Inspecting tensorflow's .data, .meta, and .index

I have been trying, for a couple of weeks now, to use multiple neural networks that I've found on GitHub. Most of the time these repos contain a folder with .meta, .index, and .data files. I first want to inspect these neural networks using TensorBoard(or any other tool), and then use them propertly.
So far I have tried converting these files to .pb, and then use this file in tensor board. But ofcourse this approach has not worked.
I have made some assumptions in this process:
1) I'm running the latest TensorFlow (py3) docker container on macOS.
2) I'm assuming that for just inspecting a file I do not require the necessary hardware that a network might need.
For converting these files to .pb, I've used the following code:
import tensorflow as tf
meta_path = '/Users/emiliovazquez/Documents/Fall2019/cs594/Final/models/triviaqa-unfiltered-shared-norm/best-weights/best-202000.meta' # Your .meta file
output_node_names = [n.name for n in tf.get_default_graph().as_graph_def().node] # Output nodes
with tf.Session() as sess:
# Restore the graph
saver = tf.train.import_meta_graph(meta_path)
# Load weights
saver.restore(sess,tf.train.latest_checkpoint('/Users/emiliovazquez/Documents/Fall2019/cs594/Final/models/triviaqa-unfiltered-shared-norm/best-weights/best-202000'))
# Freeze the graph
frozen_graph_def = tf.graph_util.convert_variables_to_constants(
sess,
sess.graph_def,
output_node_names)
# Save the frozen graph
with open('./output_graph.pb', 'wb') as f:
f.write(frozen_graph_def.SerializeToString())
To inspect the generated .pb file I've used this repo and made the appropriate changes to run on the latest TensorFlow version.
However, after running this second python file propertly, the process exits with and error. OS did not find the file specified. However, I tried with both relative and absolute paths inside the container .
Please let me know what information I'm missing, what tool I should use, or whether the given approach is correct
It'd be better if you showed your Docker files, but from what your question shows, you haven't sent the Python files to the Docker machine. If the Python file has been sent, then you haven't specified the path to the output file correctly. Since this is running in Docker, you can't use an absolute path for your computer, you'll have to use a relative path so that it works on both your machine and in Docker.

Resources