I was trying to utilize the https://github.com/microsoft/unilm/tree/master/layoutlm for document classification purpose, but was constantly getting "OSError: Unable to load weights from pytorch checkpoint file."
Can someone help me to run and work with layoutLM.
Configuration/Versions:
Windows 10
Python - 3.6.5
huggingface-transformers - 3.1.0
pytorch - 1.5.0
tensorflow - 2.3.1
command to run the code:
python run_classification.py --data_dir C:\Users\Downloads\unilm-master\unilm-master\layoutlm\examples\classification\data --model_type layoutlm --output_dir C:\Users\Downloads\unilm-master\unilm-master\layoutlm\examples\classification\data --do_eval --model_name_or_path
I believe there are some issues with the command --model_name_or_path, I have tried the above method and tried downloading the pytorch_model.bin file for layoutlm and specifying it as an argument for --model_name_or_path, but of no help.
C:\Users\Downloads\unilm-master\unilm-master\layoutlm\examples\classification\model\pytorch_model.bin.
And also I doubt if it is because of the discrepancy between Transformer's support and layoutlm support (related to the version of tranformers 3.1.0 or 2.0.0)?
Can someone help me get up to speed with layoutLM.
Help is appreciated.
Related
I am getting an error while loading 'en_core_web_sm' of spacy in Databricks notebook. I have seen a lot of other questions regarding the same, but they are of no help.
The code is as follows
import spacy
!python -m spacy download en_core_web_sm
from spacy import displacy
nlp = spacy.load("en_core_web_sm")
# Process
text = ("This is a test document")
doc = nlp(text)
I get the error "OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory"
The details of installation are
Python - 3.8.10
spaCy version 3.3
It simply does not work. I tried the following
ℹ spaCy installation:
/databricks/python3/lib/python3.8/site-packages/spacy
NAME SPACY VERSION
en_core_web_sm >=2.2.2 3.3.0 ✔
But the error still remains
Not sure if this message is relevant
/databricks/python3/lib/python3.8/site-packages/spacy/util.py:845: UserWarning: [W094] Model 'en_core_web_sm' (2.2.5) specifies an under-constrained spaCy version requirement: >=2.2.2. This can lead to compatibility problems with older versions, or as new spaCy versions are released, because the model may say it's compatible when it's not. Consider changing the "spacy_version" in your meta.json to a version range, with a lower and upper pin. For example: >=3.3.0,<3.4.0
warnings.warn(warn_msg)
Also the message when installing 'en_core_web_sm"
"Defaulting to user installation because normal site-packages is not writeable"
Any help will be appreciated
Ganesh
I suspect that you have cluster with autoscaling, and when autoscaling happened, new nodes didn't have the that module installed. Another reason could be that cluster node was terminated by cloud provider & cluster manager pulled a new node.
To prevent such situations I would recommend to use cluster init script as it's described in the following answer - it will guarantee that the module is installed even on the new nodes. Content of the script is really simple:
#!/bin/bash
pip install spacy
python -m spacy download en_core_web_sm
I am new to tensorboard with Pytorch.
I followed the codes at https://pytorch.org/docs/stable/tensorboard.html
And it generates a folder accordingly
In the command line, I run
But at http://localhost:6006/, it does not load accordingly. It keeps loading but show
I tried to use "tensorboard dev upload --logdir 'runs'" as well.
But it still shows nothing.
No event file is found here.
I am using tensorboard 2.7.0, torch 1.7, cuda 11.0.
Can anyone give me some help? Thanks a lot!!
I have recently updated my Pytorch version to 1.6.0 on my local machine to use their mixed-precision training, since then I am encountering this issue, I have tried the solution mentioned here, but it is still throwing below error.
RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /opt/conda/conda-bld/pytorch_1591914880026/work/caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 4, but the maximum supported version for reading is 3. Your PyTorch installation may be too old.
Link to reproduce: https://www.kaggle.com/rohitsingh9990/error-reproducing-code?scriptVersionId=37468859
Any help will be appreciated, thanks in advance.
In torch.__version__ == 1.6.0 :
torch.save(model_.state_dict(), 'best_model.pth.tar', use_new_zipfile_serialization=False)
then in torch.__version__ == 1.5.1:
torch.load('best_model.pth.tar',map_location='cpu')
Would you help me to achieve reproducible results with Tensorflow 1.15 without restarting Python kernel. And why the output results in TF 2.0 and TF 1.5 are different with absolutely identical parameters and dataset? Is it possible to achieve identical output?
More details:
I tried to interpret model results in TF 2.0 by:
import shap
background = df3.iloc[np.random.choice(df3.shape[0], 100, replace=False)]
explainer = shap.DeepExplainer(model, background)
I recieved an error:
`get_session` is not available when using TensorFlow 2.0.`get_session` is not available when using TensorFlow 2.0.
According to the SO topic, I tried to setup TF 2.0 compatibility with TF 1 by using in the front of my code:
import tensorflow.compat.v1 as tf
But the error appeared again.
Following advice by many users, I downgraded TF2 to TF 1.15 it solved the problem, and shap module interprets the results but:
1) to make results reproducible now I have to change tf.random.set_seed(7) on tf.random.set_random_seed(7) and restart the Python kernel every time! In TF2 I didn't have to restart the kernel.
2) prediction results has been changed, especially, Economical efficiency (that is, TF1.5. wrongly classifies more important samples than TF2.0).
TF 2:
Accuracy: 94.95%, Economical efficiency = 64%
TF 1:
Accuracy: 94.85%, Economical efficiency = 56%
The code of the model is here
First, results differ from each other not only in TF1 and TF2 versions, but also in TF2.0 and TF2.2 versions. Probably, it depends on diffenent internal parameters in the packages.
Second, TensorFlow2 works with DeepExplainer in the following versions:
import tensorflow
import pandas as pd
import keras
import xgboost
import numpy
import shap
print(tensorflow.__version__)
print(pd.__version__)
print(keras.__version__)
print(xgboost.__version__)
print(numpy.__version__)
print(shap.__version__)
output:
2.2.0
0.24.2
2.3.1
0.90
1.17.5
0.35.0
But you will face some difficulties in updating the libraries.
In Python 3.5, running TF2.2, you will face the error 'DLL load failed: The specified module could not be found'.
It 100% can be solved by installing newer C++ package. See this:https://github.com/tensorflow/tensorflow/issues/22794#issuecomment-573297027
Link to download the package:https://support.microsoft.com/ru-ru/help/2977003/the-latest-supported-visual-c-downloads
In Python 3.7 you will not find the shap 0.35.0 version with whl extention. Only tar.gz extension which gives the error: "Install visual c++ package". But installation doesn't help.
Then download shap 0.35.0 for Python 3.7 here: https://anaconda.org/conda-forge/shap/files. Run Anaconda shell. Type: conda install -c conda-forge C:\shap-0.35.0-py37h3bbf574_0.tar.bz2.
Although tensorflow is installed, Pycharm does not realize that the module tf.keras exists.
Hovering over the keras results in the following text showing up: "Cannot find reference "keras" in '__init__.py'"
Why?
You are using PyCharm version lower than 3.0.
TensorFlow 2.0 is not running on PyCharm versions older than 2019.3 version.
You can find the answer to your question here: Unable to import Keras(from TensorFlow 2.0) in PyCharm