Error : cc.fr.300.bin cannot be opened for loading - azure-machine-learning-service

I am using Azure Machine Learning and Azure Databricks.
In Azure Databricks I have a script.py written by %% command (%%write script.py).
In this script I tried to load cc.fr.300.bin that is saved as a model in Azure Machine Learning.
I did this:
import fasttext
fr_model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'cc.fr.300.bin')
fr_model = fasttext.load_model(fr_model_path)
But I have the error :
File "/structure/azureml-app/script.py", line 134, in init
fr_model = fasttext.load_model(fr_model_path)
File "/azureml-envs/azureml_d7.../lib/python3.6/site-packages/fasttext/FastText.py", line 441, in load_model
return _FastText(model_path=path)
File "/azureml-envs/azureml_d7.../lib/python3.6/site-packages/fasttext/FastText.py", line 98, in __init__
self.f.loadModel(model_path)
ValueError: /var/azureml-app/azureml-models/test/1/cc.fr.300.bin cannot be opened for loading!
What can I do ?

cc.nl.300.bin maybe a old version file. The latest fasttext cannot load it.
You can try it as following:
pip uninstall fasttext
pip install fasttext==0.6.0
Refer - https://pypi.org/project/fasttext/0.6.0/

Related

HuggingFace | ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet con

Not always, but occasionally when running my code this error appears.
At first, I doubted it was a connectivity issue but to do with cashing issue, as discussed on an older Git Issue.
Clearing cache didn't help runtime:
$ rm ~/.cache/huggingface/transformers/ *
Traceback references:
NLTK also gets Error loading stopwords: <urlopen error [Errno -2] Name or service not known.
Last 2 lines re cached_path and get_from_cache.
Cache (before cleared):
$ cd ~/.cache/huggingface/transformers/
(sdg) me#PF2DCSXD:~/.cache/huggingface/transformers$ ls
16a2f78023c8dc511294f0c97b5e10fde3ef9889ad6d11ffaa2a00714e73926e.cf2d0ecb83b6df91b3dbb53f1d1e4c311578bfd3aa0e04934215a49bf9898df0
16a2f78023c8dc511294f0c97b5e10fde3ef9889ad6d11ffaa2a00714e73926e.cf2d0ecb83b6df91b3dbb53f1d1e4c311578bfd3aa0e04934215a49bf9898df0.json
16a2f78023c8dc511294f0c97b5e10fde3ef9889ad6d11ffaa2a00714e73926e.cf2d0ecb83b6df91b3dbb53f1d1e4c311578bfd3aa0e04934215a49bf9898df0.lock
4029f7287fbd5fa400024f6bbfcfeae9c5f7906ea97afcaaa6348ab7c6a9f351.723d8eaff3b27ece543e768287eefb59290362b8ca3b1c18a759ad391dca295a.h5
4029f7287fbd5fa400024f6bbfcfeae9c5f7906ea97afcaaa6348ab7c6a9f351.723d8eaff3b27ece543e768287eefb59290362b8ca3b1c18a759ad391dca295a.h5.json
4029f7287fbd5fa400024f6bbfcfeae9c5f7906ea97afcaaa6348ab7c6a9f351.723d8eaff3b27ece543e768287eefb59290362b8ca3b1c18a759ad391dca295a.h5.lock
684fe667923972fb57f6b4dcb61a3c92763ad89882f3da5da9866baf14f2d60f.c7ed1f96aac49e745788faa77ba0a26a392643a50bb388b9c04ff469e555241f
684fe667923972fb57f6b4dcb61a3c92763ad89882f3da5da9866baf14f2d60f.c7ed1f96aac49e745788faa77ba0a26a392643a50bb388b9c04ff469e555241f.json
684fe667923972fb57f6b4dcb61a3c92763ad89882f3da5da9866baf14f2d60f.c7ed1f96aac49e745788faa77ba0a26a392643a50bb388b9c04ff469e555241f.lock
c0c761a63004025aeadd530c4c27b860ec4ecbe8a00531233de21d865a402598.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b
c0c761a63004025aeadd530c4c27b860ec4ecbe8a00531233de21d865a402598.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b.json
c0c761a63004025aeadd530c4c27b860ec4ecbe8a00531233de21d865a402598.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b.lock
fc674cd6907b4c9e933cb42d67662436b89fa9540a1f40d7c919d0109289ad01.7d2e0efa5ca20cef4fb199382111e9d3ad96fd77b849e1d4bed13a66e1336f51
fc674cd6907b4c9e933cb42d67662436b89fa9540a1f40d7c919d0109289ad01.7d2e0efa5ca20cef4fb199382111e9d3ad96fd77b849e1d4bed13a66e1336f51.json
fc674cd6907b4c9e933cb42d67662436b89fa9540a1f40d7c919d0109289ad01.7d2e0efa5ca20cef4fb199382111e9d3ad96fd77b849e1d4bed13a66e1336f51.lock
Code:
from transformers import pipeline, set_seed
generator = pipeline('text-generation', model='gpt2') # Error
set_seed(42)
Traceback:
2022-03-03 10:18:06.803989: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-03-03 10:18:06.804057: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
[nltk_data] Error loading stopwords: <urlopen error [Errno -2] Name or
[nltk_data] service not known>
2022-03-03 10:18:09.216627: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-03-03 10:18:09.216700: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-03-03 10:18:09.216751: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (PF2DCSXD): /proc/driver/nvidia/version does not exist
2022-03-03 10:18:09.217158: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-03 10:18:09.235409: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
All model checkpoint layers were used when initializing TFGPT2LMHeadModel.
All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.
Traceback (most recent call last):
File "/home/me/miniconda3/envs/sdg/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/me/miniconda3/envs/sdg/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/mnt/c/Users/me/Documents/GitHub/project/foo/bar/__main__.py", line 26, in <module>
nlp_setup()
File "/mnt/c/Users/me/Documents/GitHub/project/foo/bar/utils/Modeling.py", line 37, in nlp_setup
generator = pipeline('text-generation', model='gpt2')
File "/home/me/miniconda3/envs/sdg/lib/python3.8/site-packages/transformers/pipelines/__init__.py", line 590, in pipeline
tokenizer = AutoTokenizer.from_pretrained(
File "/home/me/miniconda3/envs/sdg/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 463, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File "/home/me/miniconda3/envs/sdg/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 324, in get_tokenizer_config
resolved_config_file = get_file_from_repo(
File "/home/me/miniconda3/envs/sdg/lib/python3.8/site-packages/transformers/file_utils.py", line 2235, in get_file_from_repo
resolved_file = cached_path(
File "/home/me/miniconda3/envs/sdg/lib/python3.8/site-packages/transformers/file_utils.py", line 1846, in cached_path
output_path = get_from_cache(
File "/home/me/miniconda3/envs/sdg/lib/python3.8/site-packages/transformers/file_utils.py", line 2102, in get_from_cache
raise ValueError(
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.
Failed Attempts
I closed my IDE and bash terminal. Ran wsl.exe --shutdown in PowerShell. Relaunched IDE and bash terminal with same error.
Disconnecting/ different VPN.
Clear cache $ rm ~/.cache/huggingface/transformers/ *.
make sure you are not loading a tokenizer with an empty path. That solved it for me.
I saw a answer in github which you can have a try:
pass force_download=True to from_pretrained which will override the cache and re-download the files.
Link at :https://github.com/huggingface/transformers/issues/8690 By:patil-suraj
Since I am working in a conda venv and using Poetry for handling dependencies, I needed to re-install torch - a dependency for Hugging Face 🤗 Transformers.
First, install torch:
PyTorch's website lets you chose your exact setup/ specification for install. In my case, the command was
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
Then add to Poetry:
poetry add torch
Both take ages to process. Runtime was back to normal :)

OSerror: loading a h5 saved model in tensorflow keras after updating the environment in anaconda on windows with python 3.7

I am recieving an OSerror (withouth any other text) from h5py when loading an h5 model created with keras- tensorflow after updating my enviroment, or working with an up-to-date environment.
I trained some models with keras and tf in the older versions, and also with keras-tf v1.15 and saved them using the model.save('filename.h5') code. Afterwards i am able to load them and work with them further using before the keras.load_model, and now tensorflow.keras.models.load_model without any problems but recieving some warnings that my tf version was not compiled to use the avx2 instructions and so.
The version installed is tensorflow 1.15 using pip install tensorflow-cpu and it seems to work well, my enviroment installed is Anaconda3-2020.02-Windows-x86_64 installed from the anaconda binaries on Windows.
After trying to change the packages to tensorflow-mkl, and needing to update my enviroment because of enviromental conflicts (shows even with the fresh install of anaconda) the OSerror raised by h5py appears.
Using the default enviromental packages from the anaconda binary with tf-cpu seems to work fine, either by cloning the environment. When updating the environment with conda update --all it raises the error either with tfc-cpu or tf-mkl.
The version of h5py in both cases is: '2.10.0' and the error is the following:
Traceback (most recent call last):
File "C:\Users\Oscar\bwSyncAndShare\OPT_PV22WP_intern\pv2wp_control\SIM\Sim_future.py", line 88, in <module>
model = load_model(pathfile_model)
File "C:\Users\Oscar\anaconda3\envs\optimizer2\lib\site-packages\tensorflow_core\python\keras\saving\save.py", line 142, in load_model
isinstance(filepath, h5py.File) or h5py.is_hdf5(filepath))):
File "C:\Users\Oscar\anaconda3\envs\optimizer2\lib\site-packages\h5py\_hl\base.py", line 44, in is_hdf5
return h5f.is_hdf5(filename_encode(fname))
File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\h5f.pyx", line 156, in h5py.h5f.is_hdf5
OSError
Have anyone had this problem?
I have tried training a model with the updated environment and saving
it, when loading i get the same error.
Updating to tf-cpu v2.3.1
with the base environment and loading works also.
Creating a new env, with conda create -n name python==3.7.x anaconda
and then installing tf, doesn´t work.
i think then some other library is making the problem, but i cannot figure out what is the problem.
I use hd5 instead of h5 as the extension,and solve the problem.
i can load my deep model in colab bu when i want load that model in pc i can't

how to download en_core_web_sm in orange3?

I want to use spacy inside Python script in Orange3 tool, but I get this error Can't find model 'en_core_web_sm' for using nlp = spacy.load("en_core_web_sm"). I wonder how I can install this model in orange3?. Btw, I am using Orange3 as individual tool , not inside Anaconda jupyter notebook
Error: Traceback (most recent call last): File "", line 1, in File "", line 2, in File "C:\Users\saif\AppData\Local\Orange\lib\site-packages\spacy__init__.py", line 30, in load return util.load_model(name, **overrides) File "C:\Users\saif\AppData\Local\Orange\lib\site-packages\spacy\util.py", line 169, in load_model raise IOError(Errors.E050.format(name=name)) OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory. >>>
Thanks in advance
It appears that spaCy wants you to download their models using their command line interface, however if you're working in a development environment where all you can do is write scripts and run them (which I assume is the case regarding Orange3), you can import the function that spaCy uses internally to download and install models and call that.
from spacy.cli.download import download
download('en_core_web_sm')

Error training tensor-flow object detection api using google cloud VM. [ImportError: No module named 'tensorflow.python.eager']

When training tensor-flow object detection api using the typical steps in google cloud VM. After configuring all the dependence, when i try to run the train.py script the error [ImportError: No module named 'tensorflow.python.eager'] poped up. I already trained using the same steps in my local PC without any errors. I couldn't find any solution related to this error.
System Info: gcloud VM; TensorFlow-GPU 1.3.0; Python 3.5; CUDA 8.0 /cuDNN 6.0:
script running command:
$ python3 train.py --logtostderr --train_dir=training/ --pipeline_config_path=ssd_mobilenet_v1_lap.config
Error:
Traceback (most recent call last): File "train.py", line 49, in
from object_detection import trainer File "/usr/local/lib/python3.5/dist-packages/object_detection-0.1-py3.5.egg/object_detection/trainer.py", line 33 , in
from deployment import model_deploy File "/home/ragulh28/project/models/research/slim/deployment/model_deploy.py",
line 106, in
from tensorflow.python.eager import context ImportError: No module named 'tensorflow.python.eager'
This issue is caused by a dependency on the new TF Eager API some of the newer models in slim use. They require the latest version of tensorflow, which is why the library is not being found.
Our apologies for the inconvenience. As a workaround, could you try checking out an older version of the Tensorflow Object Detection API? This commit should be a good candidate.

Load pkl (using joblib or pickle) generates keyerror 120

I am trying to load a pkl file (in a windows machine) using joblib.
So my code is
from sklearn.externals import joblib
output = joblib.load("file.pkl")
I get this error:
File "cleaning.py", line 97, in <module>
output = joblib.load('file.pkl')
File "C:\Users\me\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in load
obj = unpickler.load()
File "C:\Users\me\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in load
dispatch[key[0]](self)
KeyError: 120
I tried also using pickle, in this way:
import pickle
with open('file.pkl', 'r') as input:
output = pickle.load(input)
But I got this other error:
File "cleaning.py", line 94, in <module>
output = pickle.load(input)
_pickle.UnpicklingError: invalid load key, 'x'.
Does anyone could help me?
I have already searched on stackoverflow but I didn't find any solution which works for me...
Thanks
Try upgrading scikit-learn to 0.18.1.
pip install scikitlearn==0.18.1
This worked for me after upgrading from the default anaconda version (0.17).

Resources