How to Change Model Pocket Sphinx - python-3.x

I have set up pocket sphinx in linux and am trying to generate custom language model. I tried to generate my custom language model using this link: http://www.speech.cs.cmu.edu/tools/lmtool-new.html
The code I used to work with provided model works fine but when feeding my custom model I get the following error:
_pocketsphinx.new_Decoder(*args) RuntimeError: new_Decoder returned -1
The used sample code is as follows:
import os
from os import environ, path
from pocketsphinx import LiveSpeech
from sphinxbase import *
pocketsphinx_dir = os.path.dirname(__file__)
print(pocketsphinx_dir)
MODELDIR = "./myModel/model"
MODELDIR1 = "./myModel"
speech = LiveSpeech(
verbose=False,
sampling_rate=16000,
buffer_size=2048,
no_search=False,
full_utt=False,
hmm=os.path.join(MODELDIR, 'en-us/en-us'),
lm=os.path.join(MODELDIR1, '2506.lm'),
dic=os.path.join(MODELDIR1, '2506.dic')
)
for phrase in speech: print(phrase)
Also note that I already tried using the absolute path as suggested in this answer but that did not helped my case.

Related

Using Arabert model with SpaCy

SpaCy doesn't support the Arabic language, but Can I use SpaCy with the pretrained Arabert model?
Is it possible to modify this code so it can accept bert-large-arabertv02 instead of en_core_web_lg?
!python -m spacy download en_core_web_lg
import spacy
nlp = spacy.load("en_core_web_lg")
Here How we can call AraBertV.02
from arabert.preprocess import ArabertPreprocessor
from transformers import AutoTokenizer, AutoModelForMaskedLM
model_name="aubmindlab/bert-large-arabertv02"
arabert_prep = ArabertPreprocessor(model_name=model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForMaskedLM.from_pretrained(model_name)
spaCy actually does support Arabic, though only at an alpha level, which basically just means tokenization support (see here). That's enough for loading external models or training your own, though, so in this case you should be able to load this like any HuggingFace model - see this FAQ.
In this case this would look like:
import spacy
nlp = spacy.blank("ar") # empty English pipeline
# create the config with the name of your model
# values omitted will get default values
config = {
"model": {
"#architectures": "spacy-transformers.TransformerModel.v3",
"name": "aubmindlab/bert-large-arabertv02"
}
}
nlp.add_pipe("transformer", config=config)
nlp.initialize() # XXX don't forget this step!
doc = nlp("فريك الذرة لذيذة")
print(doc._.trf_data) # all the Transformer output is stored here
I don't speak Arabic, so I can't check the output thoroughly, but that code ran and produced an embedding for me.

catalogue.RegistryError: [E893] Could not find function 'Custom_Candidate_Gen.v1' in function registry 'misc'

I am currently building a spacy pipeline with custom NER,Entity Linker and Textcat components. For my Entity Linker component, I have modified the candidate_generator() to suit my use-case. I have used the ner_emersons demo project for reference. Following is my custom_functions code.
import spacy
from functools import partial
from pathlib import Path
from typing import Iterable, Callable
from spacy.training import Example
from spacy.tokens import DocBin
from spacy.kb import Candidate, KnowledgeBase, get_candidates
#spacy.registry.misc("Custom_Candidate_Gen.v1")
def create_candidates():
return custom_get_candidates
def custom_get_candidates(kb, span):
return kb.get_alias_candidates(span.text.lower())
#spacy.registry.readers("MyCorpus.v1")
def create_docbin_reader(file: Path) -> Callable[["Language"], Iterable[Example]]:
return partial(read_files, file)
def read_files(file: Path, nlp: "Language") -> Iterable[Example]:
# we run the full pipeline and not just nlp.make_doc to ensure we have entities and sentences
# which are needed during training of the entity linker
with nlp.select_pipes(disable="entity_linker"):
doc_bin = DocBin().from_disk(file)
docs = doc_bin.get_docs(nlp.vocab)
for doc in docs:
yield Example(nlp(doc.text), doc)
After training my entity linker and adding my textcat component to the pipeline, I am getting the following error:
catalogue.RegistryError: [E893] Could not find function 'Custom_Candidate_Gen.v1' in function registry 'misc'. If you're using a custom function, make sure the code is available. If the function is provided by a third-party package, e.g. spacy-transformers, make sure the package is installed in your environment.
Available names: spacy.CandidateGenerator.v1, spacy.EmptyKB.v1, spacy.KBFromFile.v1, spacy.LookupsDataLoader.v1, spacy.ngram_range_suggester.v1, spacy.ngram_suggester.v1
Why isn't my custom Candidate Generator getting registered?
Your options for having custom code loaded and registered when you load a model:
import this code directly in your script before loading the model
package it with your model with spacy package --code and load the model from the installed package name (rather than the directory)
provide this code in a separate package that uses entry points in setup.cfg to register the methods (which works fine, but wouldn't be my first choice in this situation)
See:

Azure ML model deployment fail: Module not found error

I'm trying to deploy a model locally using Azure ML before deploying to AKS. I have a custom script that I want to import into my entry script (scoring script), but it's saying it is not found.
Here is the error:
Here's my entry script with the custom script import on line 1:
import rake_refactored as rake
from operator import itemgetter
import pandas as pd
import datetime
import re
import operator
import numpy as np
import json
# Called when the deployed service starts
def init():
global stopword_path
# AZUREML_MODEL_DIR is an environment variable created during deployment.
# It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
# For multiple models, it points to the folder containing all deployed models (./azureml-models)
stopword_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'models/SmartStoplist.txt')
# load models
def preprocess(df):
df = rake.prepare_data(df)
text = rake.process_response(df, "RESPNS")
return text
# Use model to make predictions
def predict(df):
text = preprocess(df)
return rake.extract_keywords(stopword_path, text)
def run(data):
try:
# Find the data property of the JSON request
df = pd.read_json(json.loads(data))
prediction = predict(df)
return json.dump(prediction)
except Exception as e:
return str(e)
And here is my model artifact directory in Azure ML showing that it is in the same directory as the entry script (rake_score.py).
What am I doing wrong? I had a similar issue before with a sklearn package that I was able to add to the pip-package list when I built the environment, but my custom script isn't a pip package.
Not able to find rake_refactored in documentation and on the internet.
You can try below steps for importing rake.
Using pip
pip install rake-nltk
Directly from the repository
git clone https://github.com/csurfer/rake-nltk.git
python rake-nltk/setup.py install
Sample Code:
from rake_nltk import Rake
# Uses stopwords for english from NLTK, and all puntuation characters by
# default
r = Rake()
# Extraction given the text.
r.extract_keywords_from_text(<text to process>)
# Extraction given the list of strings where each string is a sentence.
r.extract_keywords_from_sentences(<list of sentences>)
# To get keyword phrases ranked highest to lowest.
r.get_ranked_phrases()
# To get keyword phrases ranked highest to lowest with scores.
r.get_ranked_phrases_with_scores()
Refer - https://github.com/csurfer/rake-nltk
In order to access my custom script in my scoring script I needed to explicitly define the source directory in my inference configuration:
from azureml.core.model import InferenceConfig
inference_config = InferenceConfig(
environment = env,
entry_script = "rake_score.py",
source_directory='./models'
)

Sphinx link to python standard library documentation

In order to be able to reference standard python documentation I have added to my config file the following:
import os
import sys
sys.path.insert(0, 'C:/ProgramData/Anaconda3/lib/site-packages')
sys.path.insert(0, os.path.abspath('../..'))
master_doc = 'index'
extensions = ['sphinx.ext.intersphinx']
intersphinx_mapping = {'python': ('https://docs.python.org/3.6', None)}
I have a rst file that makes reference to a python function as:
See :py:func:`io.open`.
When the documentation is built, it correctly recognizes io.open as an external function, creates a link, and when I pass the mouse over it shows the message (in Python 3.6), so I believe it is somehow working.
However, the link that it uses is:
file:///C:/mylib/docs/build/python/library/io.html#io.open
instead of:
https://docs.python.org/3.6/library/io.html#io.open
Am I missing something extra in the config? What am I doing wrong?

Why the import "from tensorflow.train import Feature" doesn't work

That's probably totally noob question which has something to do with python module importing, but I can't understand why the following is valid:
> import tensorflow as tf
> f = tf.train.Feature()
> from tensorflow import train
> f = train.Feature()
But the following statement causes an error:
> from tensorflow.train import Feature
ModuleNotFoundError: No module named 'tensorflow.train'
Can please somebody explain me why it doesn't work this way? My goal is to use more short notation in the code like this:
> example = Example(
features=Features(feature={
'x1': Feature(float_list=FloatList(value=feature_x1.ravel())),
'x2': Feature(float_list=FloatList(value=feature_x2.ravel())),
'y': Feature(int64_list=Int64List(value=label))
})
)
tensorflow version is 1.7.0
Solution
Replace
from tensorflow.train import Feature
with
from tensorflow.core.example.feature_pb2 import Feature
Explanation
Remarks about TensorFlow's Aliases
In general, you have to remember that, for example:
from tensorflow import train
is actually an alias for
from tensorflow.python.training import training
You can easily check the real module name by printing the module. For the current example you will get:
from tensorflow import train
print (train)
<module 'tensorflow.python.training.training' from ....
Your Problem
In Tensorflow 1.7, you can't use from tensorflow.train import Feature, because the from clause needs an actual module name (and not an alias). Given train is an alias, you will get an ImportError.
By doing
from tensorflow import train
print (train.Feature)
<class 'tensorflow.core.example.feature_pb2.Feature'>
you'll get the complete path of train. Now, you can use the import path as shown above in the solution above.
Note
In TensorFlow 1.9.0, from tensorflow.train import Feature will work, because tensorflow.train is an actual package, which you can therefore import. (This is what I see in my installed Tensorflow 1.9.0, as well as in the documentation, but not in the Github repository. It must be generated somewhere.)
Info about the path of the modules
You can find the complete module path in the docs. Every module has a "Defined in" section. See image below (taken from Module: tf.train):
I would advise against importing Feature (or any other object) from the non-public API, which is inconvenient (you have to figure out where Feature is actually defined), verbose, and subject to change in future versions.
I would suggest as an alternative to simply define
import tensorflow as tf
Feature = tf.train.Feature

Resources