Using CMU's Pocketsphinx with a small set of words - linux

I want to use CMU pocket sphinx for recognizing between a small set of words. I created a corpus for them and created the model files here - http://www.speech.cs.cmu.edu/tools/lmtool.html.
Now when I run the pocketsphinx_continuous executable with this model on my 12 core Linux machine, it takes about 5 seconds to recognize each word.
Is this library usually this slow or am I doing something wrong?
the console output shows that it is still searching and evaluating a large number of words where as the size of my model is only 12 words.
Is there any other lightweight and easy to use library which I can use for this simple task of disnguishing between about 12-15 words.

Related

Conversion of IOB to spacy JSON taking alot of time ( IOB has 1 million lines)

I just want little guidance that there are 3 IOB file dev, test & train.
Dev has 1 million lines.
Test has 4 million lines.
Train has 30 million.
I am currently just converting dev file as of now because i wasn't sure whether is there any error or not in it.
(the IOB format is correct) It's been over 3 hours as of now can idea will this file work or shall I use something else.
I am fine-tuning a bert model using spacy in google colab the Runtime hardware chosen is GPU and the , and for reference I have followed this article:
https://towardsdatascience.com/how-to-fine-tune-bert-transformer-with-spacy-3-6a90bfe57647
I have followed the exact steps of the article.
I am not familiar with NLP domain neither do I have profound knowledge of pipelining. Can someone please help regarding this, it's really important.
Below i would attach the image regarding time and the statement executed for conversion.
Image showing time elapsed and command executed

CTC + BLSTM Architecture Stalls/Hangs before 1st epoch

I am working on a code which recognizes online handwriting recognition.
It works with CTC loss function and Word Beam Search (custom implementation: githubharald)
TF Version: 1.14.0
Following are the parameters used:
batch_size: 128
total_epoches: 300
hidden_unit_size: 128
num_layers: 2
input_dims: 10 (number of input Features)
num_classes: 80 (CTC output logits)
save_freq: 5
learning_rate: 0.001
decay_rate: 0.99
momentum: 0.9
max_length: 1940.0 (BLSTM with variable length time stamps)
label_pad: 63
The problem that I'm facing is, that after changing the decoder from CTC Greedy Decoder to Word Beam Search, my code stalls after a particular step. It does not show the output of the first epoch and is stuck there for about 5-6 hours now.
The step it is stuck after: tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
I am using a Nvidia DGX-2 for training (name: Tesla V100-SXM3-32GB)
Here is the paper describing word beam search, maybe it contains some useful information for you (I'm the author of the paper).
I would look at your task as two separate parts:
optical model, i.e. train a model that is as good as possible at reading text just by "looking" at it
language model, i.e. use a large enough text corpus, use a fast enough mode of the decoder
To select the best model for part (1), using best path (greedy) decoding for validation is good enough.
If the best path contains wrong characters, chances are high that also beam search has no chance to recover (even when using language models).
Now to part (2). Regarding runtime of word beam search: you are using "NGramsForecast" mode, which is the slowest of all modes. It has running time O(W*log(W)) with W being the number of words in the dictionary. "NGrams" has O(log(W)).
If you look into the paper and go to Table 1, you see that the runtime gets much worse when using the forecast modes ("NGramsForecast" or "NGramsForecastAndSample"), while character error rate may or may not get better (e.g. "Words" mode has 90ms runtime, while "NGramsForecast" has over 16s for the IAM dataset).
For practical use cases, I suggest the following:
if you have a dictionary (that means, a list of unique words), then use "Words" mode
if you have a large text corpus containing enough sentences in the target language, then use "NGrams" mode
don't use the forecast modes, instead use "Words" or "NGrams" mode and increase the beam width if you need better character error rate

How to train YOLO-Tensor flow own dataset

I am trying make an app which will detect traffic sign from video's frames.I am using yolo-tensor by following steps from https://github.com/thtrieu/darkflow .
I need to know how can I train this model with my data-set of images of traffice signs?
If you're using Darkflow on Windows then you need to make some small adjustments to how you use Darkflow. If cloning the code and using straight from the repository then you need to place python in front of the commands given as it is a python file.
e.g. python flow --imgdir sample_img/ --model cfg/yolo-tiny.cfg --load bin/yolo-tiny.weights --json
If you are installing using pip globally (not a bad idea) and you still want to use the flow utility from any directory just make sure you take the flow file with you.
To train, use the commands listed on the github page here: https://github.com/thtrieu/darkflow
If training on your own data you will need to take some extra steps as outlined here: https://github.com/thtrieu/darkflow#training-on-your-own-dataset
Your annotations need to be in the popular PASCAL VOC format which are a set of xml files including file information and the bounding box data.
Point your flow command at your new dataset and annotations to train.
The best data for you to practice is PASCAL VOC dataset. There are 2 folders you need to prepare for the training. 1 folder with images and 1 folder with xml files(annotation folder), 1 image will need 1 xml file (have the same name) content all the basic informations (object name, object position, ...). after that you only need to choose 1 predefine .cfg file in cfg folder and run the command follow:
flow --model cfg/yolo-new.cfg --train --dataset "path/to/images/folder" --annotation "path/to/annotation/folder"
Read more the options supported by darkflow to optimize more the training process.
After spending too much time on how to train custom data set for object detection
Prerequisite :
1:training environment : a system with at least 4gb GPU or you can use AWS / GCP pre-configured cloud machine with cuda 9 installation
2: ubuntu 16.04 os
3: images of the object you want to detect. images size should not be too much large it will create out of memory issue in dataset training
4: labelling tool many are available like LabelImg/ BBox-Label-Tool i used is also good one
I tried python project dataset-generator also but result of labelling using dataset generator was not efficient in real time scenarios
My suggestion for training environment is to use AWS machine rather than spend time in local installation of cuda and cudnn even though you are able to install cuda locally but if you are not having GPU >= 4 gb you will not be able to train many times it will break due to out of memory issue
solutions to train data set :
1: train ssd_mobilenet_v2 data set using tensorflow object detection api
this training output can be use on both android , ios platform
2: use darknet to train data set which required pascal VOC data format of labelling , for that labelIMG can do the job of labelling very good
3: retrain that data weights which comes as output from darknet with darkflow

Pocketsphinx cannot decode mfc file while pocketsphinx_continuous decodes corresponding wav

I have been working with CMUsphinx for Turkish language speech to text for couple months. I have succeeded to run a train on a 100 hours of sound. My target was to use the resulting Acoustic Model with Sphinx3 decoder. However Sphinx3 decoder cannot decode my test wav files. Then I have noticed that sphinxtrain runs pocketsphinx_batch in the end of training for testing the model.
So, I started working on poscketsphinx. I am at a point where pocketsphinx batch cannot decode a wav file (actually it only produces ııı nothing else) but pocketsphinx continuous produces more meaningful output with the same file (e.g. 10 correct words out of 15 words).
I guess I am missing some configuration steps. I have an compressed archive in this link
which includes the Acoustic and language models, dictionary and wav files I try to decode.
I am asking to get help for being able to use my model with Sphinx3 and Pocketsphinx_batch.
Thank you.
Fortunately I found the problem. It was feature vectors which are produced by sphinx_fe. I was creating them with default values. After reading the make_feats.pl and sphinxtrain.cfg files, I created feature vectors compatible with the Acoustic Model. Sphinxtrain.cfg has the lifter parameter as 22, but if we use sphinx_fe with default values lifter is 0, which means no lifter. I created mfc with lifter value 22 then it worked.

applying word2vec on small text files

I'm totally new to word2vec so please bear it with me. I have a set of text files each containing a set of tweets, between 1000-3000. I have chosen a common keyword ("kw1") and I want to find semantically relevant terms for "kw1" using word2vec. For example if the keyword is "apple" I would expect to see related terms such as "ipad" "os" "mac"... based on the input file. So this set of related terms for "kw1" would be different for each input file as word2vec would be trained on individual files (eg., 5 input files, run word2vec 5 times on each file).
My goal is to find sets of related terms for each input file given the common keyword ("kw1"), which would be used for some other purposes.
My questions/doubts are:
Does it make sense to use word2vec for a task like this? is it technically right to use considering the small size of an input file?
I have downloaded the code from code.google.com: https://code.google.com/p/word2vec/ and have just given it a dry run as follows:
time ./word2vec -train $file -output vectors.bin -cbow 1 -size 200 -window 10 -negative 25 -hs 1 -sample 1e-3 -threads 12 -binary 1 -iter 50
./distance vectors.bin
From my results I saw I'm getting many noisy terms (stopwords) when I'm using the 'distance' tool to get related terms to "kw1". So I did remove stopwords and other noisy terms such as user mentions. But I haven't seen anywhere that word2vec requires cleaned input data?
How do you choose right parameters? I see the results (from running the distance tool) varies greatly when I change parameters such as '-window', '-iter'. Which technique should I use to find the correct values for the parameters. (manual trial and error is not possible for me as I'll be scaling up the dataset).
First Question:
Yes, for almost any task that I can imagine word2vec being applied to you are going to have to clean the data - especially if you are interested in semantics (not syntax) which is the usual reason to run word2vec. Also, it is not just about removing stopwords although that is a good first step. Typically you are going to want to have a tokenizer and sentence segmenter as well, I think if you look at the document for deeplearning4java (which has a word2vec implementation) it shows using these tools. This is important since you probably don't care about the relationship between apple and the number "5", apple and "'s", etc...
For more discussion on preprocessing for word2vec see https://groups.google.com/forum/#!topic/word2vec-toolkit/TI-TQC-b53w
Second Question:
There is no automatic tuning available for word2vec AFAIK, since that implys the author of the implementation knows what you plan to do with it. Typically default values for the implementation are the "best" values for whoever implemented on a (or a set of) tasks. Sorry, word2vec isn't a turn-key solution. You will need to understand the parameters and adjust them to fix your task accordingly.

Resources