I'm doing sentiment analysis on users' transcripts for UX website testing. I get the transcript from the testing session and then I analyze the transcript for sentiment analysis - what's the user's opinion about the website, what problems did the user encounter, whether he had any problems, got stuck, lost. Since this is quite domain-specific, I'm testing both TextBlob and Vader and see which gives better results. My issue is at the beginning of the process - the speech-to-text API's transcript isn't perfect. Sentences (periods) are not captured or are minimal. I'm not sure on what level the analysis should be since I was hoping I could do it on sentence-level. I tried making n-grams and analyzing those short chunks of text, but it isn't ideal and the results are slightly difficult to read - because there will be some parts that are repeated. Apart from this, I do classical text cleaning, tokenization, pos tagging, lemmatization and feed it to TextBlob and Vader.
Transcript example: okay so if I go just back over here it has all the information I need it seems like which is great so I'm pretty impressed with it similar to how a lot of government websites are set up over here it looks like I have found all the information I need it's a great website it has everything overall though it had more than enough information...
I did:
ngram_object = TextBlob(lines)
ngrams = ngram_object.ngrams(n=4)
which gives me something like (actually a WordList): [okay so if I, so if I go, if I go just...]
Then the results look like:
62 little bit small -0.21875 Negative
61 like little bit -0.18750 Negative
0 information hard find not see -0.291666667 Negative
1 hard find not see information -0.291666667 Negative
Is there a better way to analyze unstructured text in chunks rather than a full transcript?
This makes it difficult to capture what was the issue with the website. Changing the API isn't really an option since I'm working with something that was given to me to use as data collection for this particular sentiment analysis problem.
Any tips or suggestions would be highly appreciated, couldn't find anyone doing something similar to this.
I am not sure about what you really want but maybe you could take a look on speech sentiment analysis? I have read about RAVDESS, a database useful for sentiment classification. Take a look: https://smartlaboratory.org/ravdess/
Related
I recently finished work on my text-based emotion detection engine and I am looking for other existing working systems to compare with mine in order to know what should be improved and also report comparisons in an upcoming paper.
I have come across many companies claiming to do emotion detection from text but only this one offers a demo that I can use to compare with my system: http://www.o2mc.io/portfolio-posts/text-analysis-restful-api-language-polarity-and-emotion/ (scroll all the way down to see the "try it yourself" section).
Please notice that I am not looking for polarity classification, which is the simpler task of saying if a text is positive or negative. What I am looking for is for emotions (sadness, anger, joy, etc...). Does anyone here know about any company/university/person offering a demo to such system?
As a reference, here is the link to my own system's demo:
http://demo.soulhackerslabs.com/emotion/
Your help is very much appreciated.
What I want to do is create an API that translates human speech into the IPA (International Phonetic Alphabet) format. My question is, where are the resources on how to decode speech at the level of the original audio waveform. I looked for an API, but most of what I found just translates straight to the roman alphabet. I'm looking to create something a little more accurate in its ability to distinguish vocal phonetics.
I would just like to start out by saying that this project is much more difficult and complicated than you think it is. Speech to text processing is a very large and complicated field with a huge amount of research that has been done into it. The reason most parsers send things straight to roman characters is because most of their processing is a probabilistic matching of vague sounds with their context of other vague sounds to guess which words make sense together. You are much more likely to find something that will give you Soundex rather than IPA. That said, this is a problem that has been approached on several fronts. Your best bet is probably the Sphinx project from CMU.
http://cmusphinx.sourceforge.net/wiki/start
That will give you a good start, but you make an assumption that speech to text processing is a lot more developed than it actually is, and there is no simple way of translating speech to IPA through the waveform with any kind of accuracy. Sphinx is very modular and completely open source and so it would give you a huge amount of power at your fingertips, and at that point whether or not you can figure out how to make this work is up to you, but again. This is not a solved problem in any way.
I am having some trouble getting pointers to how to perform what appears to be a deceptively easy task:
Given an audio stream, how do you count the number of words that have been spoken, in real-time?
I don't need to recognize what the words are, but rather just have an accurate counter on words that have been uttered. The counter doesn't have to be too accurate and could even consider utterances and other "grunts" like coughs.
It appears that all Speech Recognition systems depend on a pre-defined grammar to be provided before they can analyze the phonemes that are spoken to convert to known words with some degree of accuracy. But I don't care about the accuracy at all, but rather the rate of words being spoken.
What is important is that this runs in real time, and allow the system to provide alerts after a certain number of words have been spoken. The system will encourage a visual cue to pause, and then the speaker can continue.
I've looked at CMU Sphinx FAQ and found that the idea of "word spotting" is not yet supported. I don't really need a real time search of particular words, but it approximates more closely to what I am looking for. Looking for very small silences in the waveform seems to be a very crude way of doing this and probably not very accurate at all, but that's all I have right now.
Any pointers on algorithms, research papers or any other insights would be appreciated!
I have an alphabet which has not been tackled before, so when scanned, there's no way to detect the letters for recognition with OCR. I'm trying to program OCR for it, but don't have much experience in this. I'd appreciate some hints as to where to get started, and how such a system is normally implemented.
Take a look at this page--it describes the training process for an open source OCR engine.
The free Stanford Online Machine Learning class has a great set of lessons on Photo OCR in Part XVIII.
This blog post has a brief description of the example taught in the class.
There are some excellent resources at google books. Likewise, if you search for Optical Character Recognition on Amazon, there are some pretty up-to-date books that look to be fairly thick and intellectually challenging :D heh
btw - I'm well aware this post has some age, but you never know when some other person might stumble across this and find just what they need. And if this even has the chance of helping out, then so be it. OCR is such a strange subject, that there's not too much out there that can really really answer the deep-machine ended questions. Especially if you're going to attempt to write your own library. :P
I am looking for a way to compare a user submitted audio recording against a reference recording for comparison in order to give someone a grade or percentage for language learning.
I realize that this is a very un-scientific way of doing things and is more than a gimmick than anything.
My first thoughts are some sort of audio fingerprinting, or waveform comparison.
Any ideas where I should be looking?
This is by no means a trivial problem to solve, though there is an abundance of research on the topic. Presently the most successful forms of machine learning in the speech recognition domain apply Hidden Markov Model techniques.
You may also want to take a look at existing implementations of HMM algorithms. One such library in its early stages is ghmm.
Perhaps even better and more readily applicable to your problem is HTK.
In addition to chomp's great answer, one important keyword you probably need to look up is Dynamic Time Warping (DTW). This is the wikipedia article: http://en.wikipedia.org/wiki/Dynamic_time_warping