Assert downloading Pytorch models (model.pth does not exist) - pytorch

I am having this problem trying to use LayoutParser but I believe it is actually an issue with Pytorch checkpoint. I get the following assert
The directory contains model.pth?dl=1 and model.pth.lock
It looks like the downloading creates a file called model.pth?dl=1 and and a .lock file. But it fails to finish by renaming it to model.pth. It seems to happen with all LayoutParser pretrained models.
Anyone seen this?
Thank you
Peter
The assert is thrown from this call with any of the pretrained models
The full code to reproduce this is:
import layoutparser as lp
if __name__ == "__main__":
model = lp.Detectron2LayoutModel(
config_path="lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config",
label_map={0:"Text",1:"Title",2:"List",3:"Table",4:"Figure"},
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST",0.1]
)

Related

Load trained model on another machine - fastai, torch, huggingface

I am using fastai with pytorch to fine tune XLMRoberta from huggingface.
I've trained the model and everything is fine on the machine where I trained it.
But when I try to load the model on another machine I get OSError - Not Found - No such file or directory pointing to .cache/torch/transformers/. The issue is the path of a vocab_file.
I've used fastai's Learner.export to export the model in .pkl file, but I don't believe that issue is related to fastai since I found the same issue appearing in flairNLP.
It appears that the path to the cache folder, where the vocab_file is stored during the training, is embedded in the .pkl file:
The error comes from transformer's XLMRobertaTokenizer __setstate__:
def __setstate__(self, d):
self.__dict__ = d
self.sp_model = spm.SentencePieceProcessor()
self.sp_model.Load(self.vocab_file)
which tries to load the vocab_file using the path from the file.
I've tried patching this method using:
pretrained_model_name = "xlm-roberta-base"
vocab_file = XLMRobertaTokenizer.from_pretrained(pretrained_model_name).vocab_file
def _setstate(self, d):
self.__dict__ = d
self.sp_model = spm.SentencePieceProcessor()
self.sp_model.Load(vocab_file)
XLMRobertaTokenizer.__setstate__ = MethodType(_setstate, XLMRobertaTokenizer(vocab_file))
And that successfully loaded the model but caused other problems like missing model attributes and other unwanted issues.
Can someone please explain why is the path embedded inside the file, is there a way to configure it without reexporting the model or if it has to be reexported how to configure it dynamically using fastai, torch and huggingface.
I faced the same error. I had fine tuned XLMRoberta on downstream classification task with fastai version = 1.0.61. I'm loading the model inside docker.
I'm not sure about why the path is embedded, but I found a workaround. Posting for future readers who might be looking for workaround as retraining is usually not possible.
I created /home/.cache/torch/transformer/ inside the docker image.
RUN mkdir -p /home/<username>/.cache/torch/transformers
Copied the files (which were not found in docker) from my local /home/.cache/torch/transformer/ to docker image /home/.cache/torch/transformer/
COPY filename:/home/<username>/.cache/torch/transformers/filename

Dataset Labeled as not found or Corrupt, but the dataset is not corrupt

I have been trying to use this Github (https://github.com/AntixK/PyTorch-VAE) and call the CelebA dataset using the config file listed. Specifically under the vae.yaml I have placed the path of the unzipped file where I have downloaded the celeba dataset (https://www.kaggle.com/jessicali9530/celeba-dataset) on my computer. And every time I run the program, I keep getting these errors:
File "/usr/local/lib/python3.6/dist-packages/torchvision/datasets/celeba.py", line 67, in init
' You can use download=True to download it')
RuntimeError: Dataset not found or corrupted. You can use download=True to download it
AttributeError: 'VAEXperiment' object has no attribute '_lazy_train_dataloader'
I have tried to download the dataset, but nothing changes. So I have no idea why the program is not running.
The run.py calls the experiment.py which uses this dataloader to retrieve the information:
def train_dataloader(self):
transform = self.data_transforms()
if self.params['dataset'] == 'celeba':
dataset = CelebA(root = self.params['data_path'],
split = "train",
transform=transform,
download=False)
else:
raise ValueError('Undefined dataset type')
self.num_train_imgs = len(dataset)
return DataLoader(dataset,
batch_size= self.params['batch_size'],
shuffle = True,
drop_last=True)
The config file grabs the information passed on the root. So what I did was upload a few files to google colab (some .jpg files) and when I run the command stated in the GItHub, python run.py -c config/vae.yaml, it states that the dataset is not found or is corrupt. I have tried this on my linux machine and the same error occurs, even when I used the downloaded and unzip link. I have gone further to attempt to change the self.params['data_path'] to the actual path and that still does not work. Any ideas what I can do?
My pytorch version is 1.6.0.
There are two issues which I have faced. The below is my solution. It is not official but it works for me. Hope the next pytorch version will update it.
Issue: Dataset not found or corrupted.'
When I checked file celeba.py in pytorch library. I found this line:
if ext not in [".zip", ".7z"] and not check_integrity(fpath, md5):
return False
This part will make self._check_integrity() return False and the program provides the message error as we got.
Solve: You can ignore this part by add "if False" immediately in front of this line
if False:
if ext not in [".zip", ".7z"] and not check_integrity(fpath, md5):
return False
celeba.py downloads dataset if you choose download=True but these two files are broken "list_landmarks_align_celeba.txt" and "list_attr_celeba.txt"
You need to find somewhere, download and replace them
Hope these solutions will help you !!!!

Model shows google.protobuf.message.decodeerror: error parsing message

I am using facenet model.... When i am doing classifier training it shows this message, but image alignment process with this model is going good...
def load_model(model):
# Check if the model is a model directory (containing a metagraph and a checkpoint file)
# or if it is a protobuf file with a frozen graph
model_exp = os.path.expanduser(model)
if (os.path.isfile(model_exp)):
print('Model filename: %s' % model_exp)
with gfile.FastGFile(model_exp,'rb') as f:
graph_def = tf.GraphDef()
print("Graph def value: ",graph_def)
print(type(graph_def))
graph_def.ParseFromString(f.read())
tf.import_graph_def(graph_def, name='')
Can anyone help me to clear this issue?
And also the above code works well in local the issue occurs in heroku server
In the above code the print statement shows an op as,
Graph def value:
<class 'tensorflow.core.framework.graph_pb2.GraphDef'>
Below is a screenshot for an issue:
The error is due to the model serving support is not working on heroku... Better you need to use the paid account on heroku with machinelearning dependencies... Or you can go with some other online deployment which supports tensorflow model serve.

Importing Tensorboard for maskRcnn(Matterport - Mask RCNN)

I am currently trying to implement Mask RCNN by following Matterport repo. I have a doubt regarding implementation of tensorboard.
The dataset is similar to coco dataset. Inside model.py under def train, tensorboard is mentioned as
callbacks = [ keras.callbacks.TensorBoard(log_dir=self.log_dir,histogram_freq=0, write_graph=True, write_images=False)
But What else I should mention for using tensorboard? When I try to run the tensorboard, it say log file not found. I know that there is something I am missing some where!!. Please help me out !
In your model.train() ensure you set custom_callbacks = callbacks parameter.
If you specified these parameters exactly like this, then it means that your issue is that you do not properly open the logs directory.
Open (inside Anaconda/PyCharm) or a separate Python terminal and put the absolute path(to make sure it works):
tensorboard --logdir = my_absolute_path/logs/

Tensorflow keras - How to avoid erroring out when loading h5 model if model is not present

I am writing an application which trains machine learning models ad-hoc, when I try to fetch the model like so:
model = tf.keras.models.load_model('./models/model.h5')
I get an error:
Unable to open file (unable to open file: name = 'models/model.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
In some special cases however the model might not be present on disk, at which point it should be created, trained and saved for later use. What would be the right approach to checking if a model is present? I could use inbuilt functionality in python to check if the file exists but it seems obvious to me that there should be a parameter on load_model which returns None instead of throwing error if the file is not present.
The Python way of checking if the file exists is the right way to go.
This may be personal, but it's not obvious that None should be returned. When you open a file, the file must exist.
You can:
import os.path
if os.path.isfile(fname):
model=load_model(fname)
else:
model = createAndTrainModel()
Or you can
try:
model=load_model(fname)
except:
model = createAndTrainModel()
I prefer the first.

Resources