Phone is mising in the acoustic model - cmusphinx

When I use pocketsphinx_continuous -infile my audio -dict my_dictionary -jsgf my_jsgf
I got these errors:
ERROR: "dict.c", line 195: Line 1: Phone 'I' is mising in the acoustic model; word 'Bismi' ignored
ERROR: "dict.c", line 195: Line 2: Phone 'I' is mising in the acoustic model; word 'Bismi(2)' ignored
ERROR: "dict.c", line 195: Line 3: Phone 'A' is mising in the acoustic model; word 'Lahi' ignored
ERROR: "dict.c", line 195: Line 4: Phone 'A' is mising in the acoustic model; word 'Lahi(2)' ignored
ERROR: "dict.c", line 195: Line 5: Phone 'HI' is mising in the acoustic model; word 'Rahmani' ignored
ERROR: "dict.c", line 195: Line 6: Phone 'HI' is mising in the acoustic model; w
ERROR: "fsg_search.c", line 141: The word 'wa' is missing in the dictionary
But, I got the a good accuracy when I tested my data in the result file
TOTAL Words: 420 Correct: 420 Errors: 0
TOTAL Percent correct = 100.00% Error = 0.00% Accuracy = 100.00%
TOTAL Insertions: 0 Deletions: 0 Substitutions: 0
How can i solve theses problems?
Thank you so much

You need to edit the dictionary to match the acoustic model phoneset or use the acoustic model to match your dictionary. Acoustic model is configured with -hmm option.

Related

ValueError: Dimension 0 in both shapes must be equal, but are 0 and 512 when use of tensorflow 2.2

I try to run PFE model. It works well when I run eval_lfw with tensorflow 2.1 and tensorflow 1.x but when I tried to run it with tensorflow 2.2 and more I have this error :
ValueError: Node 'gradients/UncertaintyModule/fc_log_sigma_sq/BatchNorm/cond/FusedBatchNorm_1_grad/FusedBatchNormGrad' has an _output_shapes attribute inconsistent with the GraphDef for output #3: Dimension 0 in both shapes must be equal, but are 0 and 512. Shapes are [0] and [512].
It happens when the model is loading when it does saver = tf.compat.v1.train.import_meta_graph(meta_file, clear_devices=True, import_scope=scope) to import the meta file of my model
To reproduce the error : download https://drive.google.com/drive/folders/10RnChjxtSAUc1lv7jbm3xkkmhFYyZrHP?usp=sharing
and run eval_lfw with parameters --model_dir pretrained/PFE_sphere64_msarcface_am --dataset_path data/Dataset --protocol_path ./proto/pairs_dataset.txt
Thank you for your help
Traceback (most recent call last):
File "/home/jordan/Bureau/Probabilistic-Face-Embeddings_new/evaluation/eval_lfw.py", line 78, in
main(args)
File "/home/jordan/Bureau/Probabilistic-Face-Embeddings_new/evaluation/eval_lfw.py", line 51, in main
network.load_model(args.model_dir)
File "/home/jordan/Bureau/Probabilistic-Face-Embeddings_new/network.py", line 169, in load_model
saver = tf.compat.v1.train.import_meta_graph(meta_file, clear_devices=True, import_scope=scope)
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1462, in import_meta_graph
**kwargs)[0]
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1486, in _import_meta_graph_with_return_elements
**kwargs))
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/framework/meta_graph.py", line 799, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 501, in _import_graph_def_internal
raise ValueError(str(e))
ValueError: Node 'gradients/UncertaintyModule/fc_log_sigma_sq/BatchNorm/cond/FusedBatchNorm_1_grad/FusedBatchNormGrad' has an _output_shapes attribute inconsistent with the GraphDef for output #3: Dimension 0 in both shapes must be equal, but are 0 and 512. Shapes are [0] and [512].

Compare each element of CSV file to every element of a different CSV file, and find the most similar elements

I have two CSV files which I need to compare. The first one is called SAP.csv, and the second is SAPH.csv.
SAP.csv has these cells:
Notification Description
5000000001 Detailed Inspection of Masts (2100mm) (3
5000000002 Ceremonial Awnings-Survey and Load Test
5000000003 HPA-Carry out 4000 hour service routine
5000000004 UxE 8 in Number Temperature Probs for C
5000000005 Overhaul valves
...while, SAPH.csv has these cells:
Notification Description
4000000015 Detailed Inspection of Masts (2100mm) (3
4000000016 Ceremonial Awnings-Survey and Load Test
4000000017 HPA-Carry out 8000 hour service routine
4000000018 UxE 8 in Number Temperature Probs for C
4000000019 Represerve valves
4000000020 STW System
They are similar, but some lines, like the fourth, (HPA-Carry out 4000 hour service routine vs. HPA-Carry out 8000 hour service routine), are slightly different.
I want to compare each value of SAP.csv against every value of SAPH.csv, and, using cosine similarity, find the most similar lines, so that the output would look something like this (the similarity percentages here are just examples, not what they would actually be):
Description
Detailed Inspection of Masts (2100mm) (3 - 100%
Ceremonial Awnings-Survey and Load Test - 100%
HPA-Carry out 4000 hour service routine - 85%
UxE 8 in Number Temperature Probs for C - 90%
Overhaul valves - 0%
Post answer edit
runfile('C:/Users/andrew.stillwell2/.spyder-py3/Estimating Test.py', wdir='C:/Users/andrew.stillwell2/.spyder-py3')
Traceback (most recent call last):
File "", line 1, in
runfile('C:/Users/andrew.stillwell2/.spyder-py3/Estimating Test.py', wdir='C:/Users/andrew.stillwell2/.spyder-py3')
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 786, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/andrew.stillwell2/.spyder-py3/Estimating Test.py", line 31, in
similarity_score = similar(job, description) # Get their similarity
File "C:/Users/andrew.stillwell2/.spyder-py3/Estimating Test.py", line 14, in similar
similarity = 1-textdistance.Cosine(qval=2).distance(a, b)
File "C:\ProgramData\Anaconda3\lib\site-packages\textdistance\algorithms\base.py", line 173, in distance
return self.maximum(*sequences) - self.similarity(*sequences)
File "C:\ProgramData\Anaconda3\lib\site-packages\textdistance\algorithms\base.py", line 176, in similarity
return self(*sequences)
File "C:\ProgramData\Anaconda3\lib\site-packages\textdistance\algorithms\token_based.py", line 175, in call
return intersection / pow(prod, 1.0 / len(sequences))
ZeroDivisionError: float division by zero
2nd Edit because of solution to the above
So the original request had just two outputs - Description and Similairty score.
Description comes from SAP
Similarity comes from the textdistance calc
Can the solution be ammended to the following
Notifcation (this is a 10 digit number which is in the SAP file)
Description (as it currently is)
Similarity (as it currently is)
Notification (this number comes from the SAPH file and would be the one which provides the similarity score)
So an example row output would like this
80000115360 Additional Materials FWD Rope Guard 86.24% 7123456789
This would be along columns A, B, C, D
A, B comes from SAP
C is calculated
D comes from SAPH
Edit 3
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 786, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/andrew.stillwell2/.spyder-py3/Est Test 2.py", line 16, in
SAP = pd.read_csv('H:\Documents/Python/Import into Python/SAP/SAP.csv', dtype={'Notification':'string'})
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 429, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 895, in init
self._make_engine(self.engine)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1122, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1853, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 490, in pandas._libs.parsers.TextReader.cinit
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\common.py", line 2017, in pandas_dtype
dtype))
TypeError: data type 'string' not understood
Post edit 4 - 25/10/20
Hi, so getting the same error as before I think
This email may contain proprietary information of BAE Systems and/or third parties.
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 786, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/andrew.stillwell2/.spyder-py3/Est Test 2.py", line 16, in
SAP = pd.read_csv('H:\Documents/Python/Import into Python/SAP/SAP.csv', dtype={'Notification':'string'}, delimiter=",", engine="python")
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 435, in _read
data = parser.read(nrows)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1139, in read
ret = self._engine.read(nrows)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 2421, in read
data = self._convert_data(data)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 2487, in _convert_data
clean_conv, clean_dtypes)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1705, in _convert_to_ndarrays
cvals = self._cast_types(cvals, cast_type, c)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1808, in _cast_types
copy=True, skipna=True)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 623, in astype_nansafe
dtype = pandas_dtype(dtype)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\common.py", line 2017, in pandas_dtype
dtype))
TypeError: data type 'string' not understood
I picked up on your bit about the delimiter so I uploaded a csv file to repl.it and it looks as though "," is the delimiter.
Therefore have altered the code to suit. When I did that on repl.it it worked.
This is the code I am using
import textdistance
import pandas as pd
def similar(a, b): # adapted from here: https://stackoverflow.com/a/63838615/8402369
similarity = 1-textdistance.Cosine(qval=2).distance(a, b)
return similarity * 100
Read the CSVs
SAP = pd.read_csv('H:\Documents/Python/Import into Python/SAP/SAP.csv', dtype={'Notification':'string'}, delimiter=",", engine="python")
SAPH = pd.read_csv('H:\Documents/Python/Import into Python/SAP/SAP_History.csv', dtype={'Notification':'string'}, delimiter=",", engine="python")
Create a pandas dataframe to store the output. The column 'Description' is populated with the values of SAP['Description']
scores = pd.DataFrame(SAP['Description'], columns = ['Notification (SAP)','Description', 'Similarity', 'Notification (SAPH)'])
Temporary variable to store the highest similarity score
highest_score = 0
desc = 0
Iterate though SAP['Description']
for job in SAP['Description']:
highest_score = 0 # Reset highest_score in each iteration
for description in SAPH['Description']: # Iterate through SAPH['Description']
similarity_score = similar(job, description) # Get their similarity
if(similarity_score > highest_score): # Check if the similarity is higher than the already saved similarity. If so, update highest_score with the new values
highest_score = similarity_score
desc = str(description)
if(similarity_score == 100): # If it's a perfect match, don't bother continuing to search.
break
Update the dataframe 'scores' with highest_score and other values
print(SAPH['Description'][SAPH['Description'] == desc])
scores['Notification (SAP)'][scores['Description'] == job] = SAP['Notification'][SAP['Description'] == job]
scores['Similarity'][scores['Description'] == job] = f'{highest_score}%'
scores['Notification (SAPH)'][scores['Description'] == job] = SAPH['Notification'][SAPH['Description'] == desc]
print(scores)
Output it to Scores.csv without the index column
with open('./Scores.csv', 'w') as file:
file.write(scores.__repr__())
Which is being run on Spyder (Python 3.7)
#George_Pipas's answer to this question demonstrates an example using the library textdistance (I'm paraphrasing part of his answer here):
A solution is to work with the textdistance library. I will provide an example of Cosine Similarity
import textdistance
1-textdistance.Cosine(qval=2).distance('Apple', 'Appel')
and we get:
0.5
So, we can create a similarity finding function:
def similar(a, b):
similarity = 1-textdistance.Cosine(qval=2).distance(a, b)
return similarity
Depending on the similarity, this'll output a number closer to 1, if a and b are more similar, and it'll output a number closer to 0 if they aren't. So if a === b, the output will be 1, but if a !== b, the output will be less than 1.
To get percentages, you just need to multiply the output by 100. Like this:
def similar(a, b): # adapted from here: https://stackoverflow.com/a/63838615/8402369
similarity = 1-textdistance.Cosine(qval=2).distance(a, b)
return similarity * 100
CSV files can be read pretty easily with pandas:
# Read the CSVs
SAP = pd.read_csv('SAP.csv')
SAPH = pd.read_csv('SAPH.csv')
We create another pandas dataframe to store the results we'll compute in:
# Create a pandas dataframe to store the output. The column 'SAP' is populated with the values of SAP['Description']
scores = pd.DataFrame({'SAP': SAP['Description']}, columns = ['SAP', 'SAPH', 'Similarity'])
Now, we iterate through SAP['Description'] and SAPH['Description'], compare each element against each other element, compute their similarity, and save the highest to scores.
# Temporary variable to store both the highest similarity score, and the 'SAPH' value the score was computed with
highest_score = {"score": 0, "description": ""}
# Iterate though SAP['Description']
for job in SAP['Description']:
highest_score = {"score": 0, "description": ""} # Reset highest_score at each iteration
for description in SAPH['Description']: # Iterate through SAPH['Description']
similarity_score = similar(job, description) # Get their similarity
if(similarity_score > highest_score['score']): # Check if the similarity is higher than the already saved similarity. If so, update highest_score with the new values
highest_score['score'] = similarity_score
highest_score['description'] = description
if(similarity_score == 100): # If it's a perfect match, don't bother continuing to search.
break
# Update the dataframe 'scores' with highest_score
scores['SAPH'][scores['SAP'] == job] = highest_score['description']
scores['Similarity'][scores['SAP'] == job] = highest_score['score']
Here's a breakdown:
A temporary variable, highest_score is created to store, well, the highest computed scores.
Now we iterate thorough SAP['Description'], and within, iterate though SAPH['Description']. This allows us to compare each value of SAP['Description'] (job) to every value of SAPH['Description'] (description).
While iterating though SAPH['Description'], we:
Compute the similarity score of both job and description
If it's higher than the saved score in highest_score, we update highest_score accordingly; otherwise we continue
If similarity_score is equal to 100, we know that it's a perfect match, and don't have to keep looking. We break the loop in this case.
Outside of the SAPH['Description'] loop, now that we've compared job to each element of SAPH['Description'], (or found a perfect match), we save the values to scores.
This repeats for every element of SAP['Description'].
Here's what scores looks like when it's finished:
SAP SAPH Similarity
0 Detailed Inspection of Masts (2100mm) (3 Detailed Inspection of Masts (2100mm) (3 100
1 Ceremonial Awnings-Survey and Load Test Ceremonial Awnings-Survey and Load Test 100
2 HPA-Carry out 4000 hour service routine HPA-Carry out 8000 hour service routine 94.7368
3 UxE 8 in Number Temperature Probs for C UxE 8 in Number Temperature Probs for C 100
4 Overhaul valves Represerve valves 53.4522
And after outputting it to a CSV file with this:
# Output it to Scores.csv without the index column (0, 1, 2, 3... far left in scores above). Remove index=False if you want to keep the index column.
scores.to_csv('Scores.csv', index=False)
...Scores.csv looks like this:
SAP,SAPH,Similarity
Detailed Inspection of Masts (2100mm) (3,Detailed Inspection of Masts (2100mm) (3,100
Ceremonial Awnings-Survey and Load Test,Ceremonial Awnings-Survey and Load Test,100
HPA-Carry out 4000 hour service routine,HPA-Carry out 8000 hour service routine,94.73684210526315
UxE 8 in Number Temperature Probs for C,UxE 8 in Number Temperature Probs for C,100
Overhaul valves,Represerve valves,53.45224838248488
View the full code, and run and edit it online
Note that textdistance and pandas are required libraries for this. Install them, if you don't have them already, with:
pip install textdistance pandas
Notes:
You can round the percent by replacing f'{highest_score}%' with this: f'{round(highest_score, NUMBER_OF_PLACES_TO_ROUND_TO)}%'
Here's a formatted version, and here's the code
EDIT: (for the problems encountered that are mentioned in the comments)
Here is an error-catching version of the similarity function:
def similar(a, b): # adapted from here: https://stackoverflow.com/a/63838615/8402369
try:
similarity = 1-textdistance.Cosine(qval=2).distance(a, b)
return similarity * 100
except ZeroDivisionError:
print('There was an error. Here are the values of a and b that were passed')
print(f'a: {repr(a)}')
print(f'b: {repr(b)}')
exit()

invalid literal for int() with base 10 Keras pad sequence

Traceback (most recent call last):
File "C:/Users/Lenovo/PycharmProjects/ProjetFinal/venv/turkish.py", line 45, in <module>
sequences_matrix = sequence.pad_sequences(sequences,maxlen=max_len)
File "C:\Users\Lenovo\PycharmProjects\ProjetFinal\venv\lib\site-packages\keras_preprocessing\sequence.py", line 96, in pad_sequences
trunc = np.asarray(trunc, dtype=dtype)
File "C:\Users\Lenovo\PycharmProjects\ProjetFinal\venv\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: invalid literal for int() with base 10:
windows loading yapıo sonra mavi ekran iste. simdi repair yapıyo bkalım olmazsa
gelirim size.
X = dft.text
your straight using text with pad_sequence. this is how pad_sequence doesn't work
Generally, every text is converted into numbers within a given vocab which can be of characters or words depend upon task.
then it can be padded.
please refer some tutorial through you can understand how to process text first.
Tensorflow tutorials - text part are good to start.
https://www.tensorflow.org/guide/keras/masking_and_padding

ValueError: negative dimensions are not allowed when loading .pkl file

Although there are many question threads for error ValueError: negative dimensions are not allowed
I couldn't find the answer for my problem
After training Machine learning model using SGDclassifer
clf=linear_model.SGDClassifier(loss='log',random_state=20000,verbose=1,class_weight='balanced')
model=clf.fit(X,Y)
Dimension of X is (1651880,246177)
The below code is working i.e when saving model object and when using model for prediction
joblib.dump(model, 'trainedmodel.pkl',compress=3)
prediction_result=model.predict(x_test)
but getting error when loading the saved model
model = joblib.load('trainedmodel.pkl')
below is the error message
Please help me out to resolve it.
File "C:\Users\Taxonomy\AppData\Roaming\Python\Python36\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 598, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "C:\Users\Taxonomy\AppData\Roaming\Python\Python36\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 526, in _unpickle
obj = unpickler.load()
File "C:\Users\Taxonomy\Anaconda3\lib\pickle.py", line 1050, in load
dispatch[key[0]](self)
File "C:\Users\Taxonomy\AppData\Roaming\Python\Python36\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 352, in load_build
self.stack.append(array_wrapper.read(self))
File "C:\Users\Taxonomy\AppData\Roaming\Python\Python36\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 195, in read
array = self.read_array(unpickler)
File "C:\Users\Taxonomy\AppData\Roaming\Python\Python36\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 141, in read_array
array = unpickler.np.empty(count, dtype=self.dtype)
ValueError: negative dimensions are not allowed
Try to dump model with protocol 4.
from python's pickle docs:
Protocol version 4 was added in Python 3.4. It adds support for very
large objects, pickling more kinds of objects, and some data format
optimizations. Refer to PEP 3154 for information about improvements
brought by protocol 4.

ValueError: Negative dimension size caused by subtracting 35 from 15 for 'MaxPool_2'

I am trying to implement the cnn keras model with pretrained word embeddings from the official example, but with my own custom dataset. Here is the url:
https://github.com/fchollet/keras/blob/master/examples/pretrained_word_embeddings.py
I am using Keras 1.2.0 with Tensorflow 1.2.1.
I get an error on lines 132-134. After searching online, all the posts pointed out to the ordering. I tried the suggestions for both tf and th, but still it didnt work.
from keras import backend as K
K.set_image_dim_ordering('tf')
Any ideas?
File "/home/usr/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2508, in create_op
set_shapes_for_outputs(ret)
File "/home/usr/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1873, in set_shapes_for_outputs
shapes = shape_func(op)
File "/home/usr/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1823, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True)
File "/home/usr/.local/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn
debug_python_shape_fn, require_shape_fn)
File "/home/usr/.local/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 676, in _call_cpp_shape_fn_impl
raise ValueError(err.message)
ValueError: Negative dimension size caused by subtracting 35 from 15 for 'MaxPool_2' (op: 'MaxPool') with input shapes: [?,15,1,128].

Resources