I want to enable my Microsoft Custom Speech model to recognize designators containing numbers, chars and dashes, something like this: 12-34 A-56 B78.
The speech model recognizes numbers and characters correctly. Is there a way to train it so it would output the string 12-34 A-56 B78 when i say "twelve thirtyfour a fiftysix b seventyeight"? I need this for a german speech model.
I've already tried to train a model with 10000 randomly generated strings like the one above. I then trained the model using related text.
Thanks in advance
These are very specific format requirements. Unfortunately, it is currently not possible to get results exactly like this from the speech service. I suggest to do some post-processing on the results to format them this way.
Related
We build an Speech To Text Application. In this Conversation always in dutch language. But in some cases English and Dutch words are same. At that time how can i train my model.
There are different ways to do the task
Train the model with audio samples of the language Dutch (Belgium or Standard) with related transcript
Without any audio file give the text file of the language to train the model
By default settings can be applied like train and test sampling separation, check the sample count and divide the sets.
create a training file with few sentences (repeated content also acceptable). Train the model with that file. Based on the language priority, the file has to contain Dutch and English related words.
Use the following can help you to create a pronunciation file
I have a set of images like this
And I'm trying to train TensoFlow on python to read the numbers on the images.
I'm new to machine learn and on my research I found a solution to a similar problem that uses CTC to train/predict variable length data on an image.
I'm trying to figure out if I should use CTC or find a way to create a new image for every number of the image that I already have.
Like if the number of my image is 213, then I create 3 new images to train the model with the respective numbers 2, 1, 3 also using them as labels. I'm looking for tutorials or even TensorFlow documentation that can help me on that.
in the case of text CTC absolutely makes sense: you don't want to split a text (like "213") into "2", "1", "3" manually, because it often is difficult to segment the text into individual characters.
CTC, on the other hand, just needs images and the corresponding ground-truth texts as input for training. You don't have to manually take care of things like alignment of chars, width of chars, number of chars. CTC handles that for you.
I don't want to repeat myself here, so I just point you to the tutorials I've written about text recognition and to the source code:
Build a Handwritten Text Recognition System using TensorFlow
SimpleHTR: a TensorFlow model for text-recognition
You can use the SimpleHTR model as a starting point. To get good results, you will have to generate training data (e.g. write a rendering tool which renders realsitic looking examples) and train the model from scratch with that data (more details on training can be found in the README).
I am stuck with a problem where in I want to ensure that specific tokens/words are produced while decoding and generating abstractive-style sentences.
I am working with deep learning models like LSTM and transformer model for generating short sentences(100-200 characters). I want that some words like places or nouns(like brand names) be present in the generated texts.
I am not sure if there has been any research on this, I couldn't really find a paper after an extensive search on it.
TIA, any leads or suggestions are appreciated. :)
I am not sure but you can try to condition your output based on those specific words. Your trainer can be like a seq2seq decoder but instead of attending to the encoder outputs it can attend to those specific words.
Do guide me along if I am not posting in the right section.
I have some text files for my training data which are unformatted in word documents. They all contain ASCII characters only.
I would like to train a model on the text files using data mining methods.
The text files do have about 300 words in each file on average.
Are there any software that are recommended for me to start on it?
My initial idea is to use all the words in one of the file as training data and the remaining as test data. This is to perform cross fold validation.
However, I have tools such as weka but it does not seem to satisfy my needs as converting to csv files does not seem to be feasible in my case as the text files are separated
I have trying to perform cross validation in such a way that all the words in the training data are considered as features.
You need to use weka StringToWord filter and convert your text files to arff files. After that you can use weka classification algorithms. Watch following video to learn basics.
I am trying to come up with a way to convert speech to text. I am trying to use Sphinx to attain this. What I mean by unguided speech to text is that, the speaker is not bound to speak from a definite set of sentences. Rather he might speak any sentence. So its not possible for me to have a grammar file, where each word is one of the alternative pre-written in the grammar file. I understand that I would have to train Sphinx somehow to do this.
But I am a beginner in sphinx. How to start training Sphinx to convert unguided speech? Is it possible to attain unguided conversion with Sphinx?
The task you are up to is, as of right now, is not yet possible to complete, at least not with satisfying accuracy.
As for the Sphinx-based solution: you will have to create dictionary with all the words to be recognized. There is no other way.
Once you have the dictionary, you can generate a simple n-gram model based on it, with ony unigrams - each unigram will be one word. The probability of each may be the same, or you may attempt to do some statistical analysis of the words that will be used.