enter image description hereI am a researcher, and I am making speech recognition system using CMU Sphinx. I have succesfully trained words of Sindhi Language with very low accuracy. Now I wanted to train complete sentences. I was Wondering how to train complete sentences, how to make dictionary, transcript, fileids and other files for complete sentences training. Help will be highly appreciated.
I Have trained words of Sindhi Language Using CMU Sphinx
Related
We build an Speech To Text Application. In this Conversation always in dutch language. But in some cases English and Dutch words are same. At that time how can i train my model.
There are different ways to do the task
Train the model with audio samples of the language Dutch (Belgium or Standard) with related transcript
Without any audio file give the text file of the language to train the model
By default settings can be applied like train and test sampling separation, check the sample count and divide the sets.
create a training file with few sentences (repeated content also acceptable). Train the model with that file. Based on the language priority, the file has to contain Dutch and English related words.
Use the following can help you to create a pronunciation file
I have used textblob to assign polarity score to english tweets.Can textblob be used to assign polarity score to Hinglish tweets?
If yes how?
Thankyou
TextBlob is not that good a solution. You can first try to convert the Hinglish tweets into English using this notebook here on GitHub.
And then do some BERT based Sentiment analysis as shown in these webpages:
survival8 Sep-2020
survival8 Sep-2022
Also, checkout this link that compares three Sentiment Analyzers:
TextBlob
VADER
BERT Based Project.
survival8: Sentiment Analysis Testing on Some Difficult Sentences
I would like to train a ner model using stanford-ner.jar CRFClassifier for Nepali or Hindi language. Can I simply use the java command line mentioned in the here
Yes if you supply training data you can produce a new model. Note that when running the NER system, you will need to tokenize the text in the same way it was tokenized for the training process.
There is some more info about training NER models here: https://stanfordnlp.github.io/CoreNLP/ner.html
I am trying to train a NER model in Indian with custom NE (named entity) dictionary for chunking. I refer to NLTK and Stanford NER repectively:
NLTK
I found the nltk.chunk.named_entity.NEChunkParser nechunkparser able to train on a custom corpus. However, the format of training corpus was not specified in the documentation or the comment of the source code.
Where could I find some guide to the custom corpus for NER in NLTK?
Stanford NER
According to the question, the FAQ of Stanford NER gives direction of how to train a custom NER model.
One of the major concern is that default Stanford NER does not support Indian. So is it viable to feed an Indian NER corpus to the model?
Your Training corpus needs to be in a .tsv file extension.
The file should some what look like this:
John PER
works O
at O
Intel ORG
This is just for representation of the data as i do not know which Indian language you are targeting. But Your data must always be Tab Separated values. First will be the token and the other value its associated label.
I have tried NER by building my custom data (in English though) and have built a model.
So I guess its pretty much possible for Indian languages also.
I can create voice recognition for my limited set of words using the following link.
http://www.speech.cs.cmu.edu/tools/lmtool-new.html
But how do I give feedback to the language model to train better for my voice.
For example, the phonetic values in .dic files are for american accent (I want to train it to indian accent).
Language model has nothing to do with voice, it operates with words. Use SphinxTrain to tailor the acoustic model to the accent you need and read how to adapt existing model or create new one.