speech to text training for impaired voice - speech-to-text

I want to train and use an ML based personal voice to text converter for a highly impaired voice, for a small set of 300-400 words. This is to be used for people with voice impairment. But cannot be generic because each person will have a unique voice input for words, depending on their type of impairment.
Wanted to know if there are any ML engines which allow for such a training. If not, what is the best approach to go about it.
Thanks

Most of the speech recognition engines support training (wav2letter, deepspeech, espnet, kaldi, etc), you just need to feed in the data. The only issue is that you need a lot of data to train reliably (1000 of samples for each word). You can check Google Commands dataset for example of how to train from scratch.
Since the training dataset will be pretty small for your case and will consist of just a few samples, you can probably start with existing pretrained model and finetune it on your samples to get best accuracy. You need to look on "few short learning" setups.
You can probably look on wav2vec 2.0 pretrained model, it should be effective for such learning. You can find examples and commands for fine-tuning and inference here.
You can also try fine-tuning Japser models in Google Commands for NVIDIA NEMO. It might be a little less effective but could still work and should be easier to setup.

I highely recommend watching the youtube original series "The age of AI"'s First season, episode two.
Basically, google already done this for people who can't really form normal words with impared voice. It is very interesting and speaks a little bit about how they done and doing that with ML technologies.
enter link description here

Related

Sentiment Analysis: Is there a way to extract positive and negative aspects in reviews?

Currently, I'm working on a project where I need to extract the relevant aspects used in positive and negative reviews in real time.
For the notions of more negative and positive, it will be a question of contextualizing the word. Distinguish between a word that sounds positive in a negative context (consider irony).
Here is an example:
Very nice welcome!!! We ate very well with traditional dishes as at home, the quality but also the quantity are in appointment!!!*
Positive aspects: welcome, traditional dishes, quality, quantity
Can anyone suggest to me some tutorials, papers or ideas about this topic?
Thank you in advance.
This task is called Aspect Based Sentiment Analysis (ABSA). Most popular is the format and dataset specified in the 2014 Semantic Evaluation Workshop (Task 5) and its updated versions in the following years.
Overview of model efficiencies over the years:
https://paperswithcode.com/sota/aspect-based-sentiment-analysis-on-semeval
Good source for ressources and repositories on the topic (some are very advanced but there are some more starter friendly ressources in there too):
https://github.com/ZhengZixiang/ABSAPapers
Just from my general experience in this topic a very powerful starting point that doesn't require advanced knowledge in machine learning model design is to prepare a Dataset (such as the one provided for the SemEval2014 Task) that is in a Token Classification Format and use it to fine-tune a pretrained transformer model such as BERT, RoBERTa or similar. Check out any tutorial on how to do fine-tuning on a token classification model like this one in huggingface. They usually use the popular task of Named Entity Recognition (NER) as the example task but for the ABSA-Task you basically do the same thing but with other labels and a different dataset.
Obviously an even easier approach would be to take more rule-based approaches or combine a rule-based approach with a trained sentiment analysis model/negation detection etc., but I think generally with a rule-based approach you can expect a much inferior performance compared to using state-of-the-art models as transformers.
If you want to go even more advanced than just fine-tuning the pretrained transformer models then check out the second and third link I provided and look at some of the machine learning model designs specifically designed for Aspect Based Sentiment Analysis.

Agriculture commodity price predictions using machine learning

I want to create a web application which uses machine learning to predict the price of agriculture commodities before 2-3 months.
Is it really feasible or not?
If yes, then please provide some rough idea about which tools and technologies I can use to implement it.
First of all, study math, more precisely, statistics and differential algebra.
Then, use any open (or not) source neural networking libraries you could find. Even MATLAB would help, as it has a good set of examples (I think it has some of alike prediction models, at least I remember creating a model for predicting election results in Poland)
Decide on your training and input data. Research how news and global situation influences commodity prices. Research how existing bots predict prices for next 1-2 minutes. Also consider using history of predictions from certain individuals, I think Reuters has some API for this. Saying this I imply you'll have to integrate natural language processors, too.
Train your model, test it, improve it for quite a long time.
Finally, deploy a boring front-end and monetize it.
If you dont want to implement ML, you can also use kalman filters.

Detect multiple voices without speech recognition

Is there a way to just detect in realtime if there are multiple people speaking? Do I need a voice recognition api for that?
I don't want to separate the audio and I don't want to transcribe it either. My approach would be to frequently record using one mic (-> mono) and then analyse those recordings. But how then would I detect und distinguish voices? I'd narrow it down by looking only at relevant frequencies, but then...
I do understand that this is no trivial undertaking. That's why I do hope there's an api out there capable of doing this out of the box - preferably an mobile/web-friendly api.
Now this might sound like a shopping list for Christmas but as mentioned I do not need to know anything about the content. So my guess is that a full fledged speech recognition would have a high toll on the performance.
Most of similar problems (adult/children classifier, speech/music classifier, single voice / voice mixture classifier) are standard machine learning problems. You can solve them with classifier like GMM. You only need to construct training data for your task, so:
Take some amount of clean recordings, you can download audiobook
Prepare mixed data by mixing clean recordings
Train GMM classifier on both
Compare probabilities from clean speech GMM and mixed speech GMM and decide the presence of mixture by ratio of probabilities from two classifiers.
You can find some code samples here:
https://github.com/littleowen/Conceptor
For example you can try
https://github.com/littleowen/Conceptor/blob/master/Gender.ipynb

News Article Categorization (Subject / Entity Analysis via NLP?); Preferably in Node.js

Objective: a node.js function that can be passed a news article (title, text, tags, etc.) and will return a category for that article ("Technology", "Fashion", "Food", etc.)
I'm not picky about exactly what categories are returned, as long as the list of possible results is finite and reasonable (10-50).
There are Web APIs that do this (eg, alchemy), but I'd prefer not to incur the extra cost (both in terms of external HTTP requests and also $$) if possible.
I've had a look at the node module "natural". I'm a bit new to NLP, but it seems like maybe I could achieve this by training a BayesClassifier on a reasonable word list. Does this seem like a good/logical approach? Can you think of anything better?
I don't know if you are still looking for an answer, but let me put my two cents for anyone who happens to come back to this question.
Having worked in NLP i would suggest you look into the following approach to solve the problem.
Don't look for a single package solution. There are great packages out there, no doubt for lots of things. But when it comes to active research areas like NLP, ML and optimization, the tools tend to be atleast 3 or 4 iterations behind whats there is academia.
Coming to the core problem. What you want to achieve is text classification.
The simplest way to achieve this would be an SVM multiclass classifier.
Simplest yes, but also with very very (see the double stress) reasonable classification accuracy, runtime performance and ease of use.
The thing which you would need to work on would be the feature set used to represent your news article/text/tag. You could use a bag of words model. add named entities as additional features. You can use article location/time as features. (though for a simple category classification this might not give you much improvement).
The bottom line is. SVM works great. they have multiple implementations. and during runtime you don't really need much ML machinery.
Feature engineering on the other hand is very task specific. But given some basic set of features and a good labelled data you can train a very decent classifier.
here are some resources for you.
http://svmlight.joachims.org/
SVM multiclass is what you would be interested in.
And here is a tutorial by SVM zen himself!
http://www.cs.cornell.edu/People/tj/publications/joachims_98a.pdf
I don't know about the stability of this but from the code its a binary classifier SVM. which means if you have a known set of tags of size N you want to classify the text into, you will have to train N binary SVM classifiers. One each for the N category tags.
Hope this helps.

Dataset for emotion classification on social media

I would like to do emotion classification on text (posts from social media e.g. tweets, facebook wall posts, youtube comments etc ...). Though I can't find a good dataset with annotated data. I'm looking for more than just data annotated with positive and negative. I'm looking for a dataset with several emotions. This could be or discrete values (ekman 6 basic emotions) or continues values (arousal-valence model). Does anyone know where I can get such a dataset, this can be from twitter, Facebook, Myspace ... as long it is from a social network
well, I think better name (or, more often used) would be Sentiment analysis (Sentiment classification) - correct? I'm not sure if social media do offer their private data (maybe some part of it). Anyway, I found this paper:
http://www.cs.cornell.edu/home/llee/papers/sentiment.pdf
They are dealing with data: http://www.cs.cornell.edu/people/pabo/movie-review-data/ from https://groups.google.com/forum/?fromgroups#!aboutgroup/rec.arts.movies.reviews.
Does it suit you? Basically, finding appropriate data is usually a big problem in ML. Often it is needed to build your own (I mean to classify a part of it manually and apply some clustering or semi-supervised learning afterwards)
If you don't find anything appropriate on the web, I'd try to contact some authors that write articles similar to your research. Maybe they will have already created datasets that will fit you...

Resources