Spacy es_core_news_sm model not loading - python-3.x

I'm trying to use Spacy for pos tagging in Spanish, for this I have checked the official documentation and also have read various post in Stackoverflow nonetheless neither has worked to me.
I have Python 3.7 and Spacy 2.2.4 installed and I'm running my code from a jupyter notebook
So as documentation suggests I tried:
From my terminal:
python -m spacy download en_core_web_sm
This gave the result:
Download and installation successful
Then in my jupyter notebook:
import spacy
nlp = spacy.load("es_core_news_sm")
And I got the following error:
ValueError: [E173] As of v2.2, the Lemmatizer is initialized with an instance of Lookups containing the lemmatization tables. See the docs for details: https://spacy.io/api/lemmatizer#init
Additionally, I tried:
import spacy
nlp = spacy.load("es_core_news_sm")
And this gave me a different error:
OSError: Can't find model 'es_core_news_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory
Could you please help me to solve this error?

You downloaded English model. In order to use Spanish model, you have to download it python -m spacy download es_core_news_sm

After downloading the right model you can try import it as follow
import spacy
import es_core_news_sm
nlp = es_core_news_sm.load()

Related

ModuleNotFoundError: No module named 'seg'

Tried to follow this line of code from this link https://spacy.io/universe/project/spacy-sentence-segmenter to create a sentence segmenter. Encountered the following error:ModuleNotFoundError: No module named 'seg'.
Spacy already installed. Didn't find any information about which module should be used for this 'seg'. Anyone could help? thanks.
from seg.newline.segmenter import NewLineSegmenter
import spacy
nlseg = NewLineSegmenter()
nlp = spacy.load('en')
nlp.add_pipe(nlseg.set_sent_starts, name='sentence_segmenter', before='parser')
doc = nlp(my_doc_text)
Sentence Segmenter is a third-party module that is different from your spaCy installation. You need to install it separately:
pip install spacyss
You can find more information in the project's Github page.
Try to install the module using pip or pip3: pip3 install segmentation

Error while importing 'en_core_web_sm' for spacy in Azure Databricks

I am getting an error while loading 'en_core_web_sm' of spacy in Databricks notebook. I have seen a lot of other questions regarding the same, but they are of no help.
The code is as follows
import spacy
!python -m spacy download en_core_web_sm
from spacy import displacy
nlp = spacy.load("en_core_web_sm")
# Process
text = ("This is a test document")
doc = nlp(text)
I get the error "OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory"
The details of installation are
Python - 3.8.10
spaCy version 3.3
It simply does not work. I tried the following
ℹ spaCy installation:
/databricks/python3/lib/python3.8/site-packages/spacy
NAME SPACY VERSION
en_core_web_sm >=2.2.2 3.3.0 ✔
But the error still remains
Not sure if this message is relevant
/databricks/python3/lib/python3.8/site-packages/spacy/util.py:845: UserWarning: [W094] Model 'en_core_web_sm' (2.2.5) specifies an under-constrained spaCy version requirement: >=2.2.2. This can lead to compatibility problems with older versions, or as new spaCy versions are released, because the model may say it's compatible when it's not. Consider changing the "spacy_version" in your meta.json to a version range, with a lower and upper pin. For example: >=3.3.0,<3.4.0
warnings.warn(warn_msg)
Also the message when installing 'en_core_web_sm"
"Defaulting to user installation because normal site-packages is not writeable"
Any help will be appreciated
Ganesh
I suspect that you have cluster with autoscaling, and when autoscaling happened, new nodes didn't have the that module installed. Another reason could be that cluster node was terminated by cloud provider & cluster manager pulled a new node.
To prevent such situations I would recommend to use cluster init script as it's described in the following answer - it will guarantee that the module is installed even on the new nodes. Content of the script is really simple:
#!/bin/bash
pip install spacy
python -m spacy download en_core_web_sm

Error while loading vector from Glove in Spacy

I am facing the following attribute error when loading glove model:
Code used to load model:
nlp = spacy.load('en_core_web_sm')
tokenizer = spacy.load('en_core_web_sm', disable=['tagger','parser', 'ner', 'textcat'])
nlp.vocab.vectors.from_glove('../models/GloVe')
Getting the following atribute error when trying to load the glove model:
AttributeError: 'spacy.vectors.Vectors' object has no attribute 'from_glove'
Have tried to search on StackOverflow and elsewhere but can't seem to find the solution. Thanks!
From pip list:
spacy version: 3.1.4
spacy-legacy 3.0.8
en-core-web-sm 3.1.0
Use spacy init vectors to load vectors from word2vec/glove text format into a new pipeline: https://spacy.io/api/cli#init-vectors
spacy version: 3.1.4 does not have the feature from_glove.
I was able to use nlp.vocab.vectors.from_glove() in spacy version: 2.2.4.
If you want, you can change your spacy version by using:
!pip install spacy==2.2.4 on your Jupyter cell.

spaCy loading model fails

I am trying to load spaCy model de_core_news_sm without any success. Since our company police seems to block the python -m spacy download de_core_news_sm prompt command, I downloaded the model manually and used pip install on the local tar.gz archive, which worked out well.
However, calling nlp = spacy.load("de_core_news_sm") in my code throws the following exception:
Exception has occurred: ValueError
[E149] Error deserializing model. Check that the config used to create the
component matches the model being loaded.
File "pipes.pyx", line 642, in
spacy.pipeline.pipes.Tagger.from_disk.load_model
I have no idea how to deal with this. Does anybody know what to do?
Run python -m spacy validate to check whether the model you downloaded is compatible with the version of spacy you have installed. This kind of error happens when the versions aren't compatible. (Probably one is v2.1 and the other is v2.2.)

Unable to load 'en' from spacy in jupyter notebook

I run the following lines of code in a jupyter notebook:
import spacy
nlp = spacy.load('en')
And get following error:
Warning: no model found for 'en_default'
Only loading the 'en' tokenizer.
I am using python 3.5.3, spacy 1.9.0, and jupyter notebook 5.0.0.
I downloaded spacy using conda install spacy and python3 spacy install en.
I am able to import spacy and load 'en' from my terminal but not from a jupyter notebook.
Based on the answer in your comments, it seems fairly clear that the two Python interpreters for Jupyter and your system Python are not the same, and therefore likely do not have shared libraries between them.
I would recommend re-running the installation or just specifically installation the en tool in the correct Spacy. Replace the path with the full path to the file, if the above is not the full path.
//anaconda/envs/capstone/bin/python -m spacy download
That should be enough. Let me know if there are any issues.
You can also download en language model in the jupyter notebook:
import sys
!{sys.executable} -m spacy download en

Resources