How does OCR work? and how to add OCR to an alphabet - text

I have an alphabet which has not been tackled before, so when scanned, there's no way to detect the letters for recognition with OCR. I'm trying to program OCR for it, but don't have much experience in this. I'd appreciate some hints as to where to get started, and how such a system is normally implemented.

Take a look at this page--it describes the training process for an open source OCR engine.

The free Stanford Online Machine Learning class has a great set of lessons on Photo OCR in Part XVIII.
This blog post has a brief description of the example taught in the class.

There are some excellent resources at google books. Likewise, if you search for Optical Character Recognition on Amazon, there are some pretty up-to-date books that look to be fairly thick and intellectually challenging :D heh
btw - I'm well aware this post has some age, but you never know when some other person might stumble across this and find just what they need. And if this even has the chance of helping out, then so be it. OCR is such a strange subject, that there's not too much out there that can really really answer the deep-machine ended questions. Especially if you're going to attempt to write your own library. :P

Related

Transcript transformation for sentiment analysis

I'm doing sentiment analysis on users' transcripts for UX website testing. I get the transcript from the testing session and then I analyze the transcript for sentiment analysis - what's the user's opinion about the website, what problems did the user encounter, whether he had any problems, got stuck, lost. Since this is quite domain-specific, I'm testing both TextBlob and Vader and see which gives better results. My issue is at the beginning of the process - the speech-to-text API's transcript isn't perfect. Sentences (periods) are not captured or are minimal. I'm not sure on what level the analysis should be since I was hoping I could do it on sentence-level. I tried making n-grams and analyzing those short chunks of text, but it isn't ideal and the results are slightly difficult to read - because there will be some parts that are repeated. Apart from this, I do classical text cleaning, tokenization, pos tagging, lemmatization and feed it to TextBlob and Vader.
Transcript example: okay so if I go just back over here it has all the information I need it seems like which is great so I'm pretty impressed with it similar to how a lot of government websites are set up over here it looks like I have found all the information I need it's a great website it has everything overall though it had more than enough information...
I did:
ngram_object = TextBlob(lines)
ngrams = ngram_object.ngrams(n=4)
which gives me something like (actually a WordList): [okay so if I, so if I go, if I go just...]
Then the results look like:
62 little bit small -0.21875 Negative
61 like little bit -0.18750 Negative
0 information hard find not see -0.291666667 Negative
1 hard find not see information -0.291666667 Negative
Is there a better way to analyze unstructured text in chunks rather than a full transcript?
This makes it difficult to capture what was the issue with the website. Changing the API isn't really an option since I'm working with something that was given to me to use as data collection for this particular sentiment analysis problem.
Any tips or suggestions would be highly appreciated, couldn't find anyone doing something similar to this.
I am not sure about what you really want but maybe you could take a look on speech sentiment analysis? I have read about RAVDESS, a database useful for sentiment classification. Take a look: https://smartlaboratory.org/ravdess/

Sound detection of cutting woods

Im really new to machine Learning.I have a project to identify a given sound.(Ex: cutting wood)In the audio clip there will be several sound. What i need to do is recognise that particular sound from it. I red some articles about machine learning. But i still have lack of knowledge where to start this project and also I'm running out of time.
Any help will be really appreciated. Can anyone please tell me how to do this?
Can i directly perform template(algorithms) matching for a sound?
It's a long journey ahead of you and Stack Overflow isn't a good place for asking such a generic question. Consult help section for more.
To get you started, here are some web sites:
Awesome Bioacoustic
Comparative Audio Analysis With Wavenet, MFCCs, UMAP, t-SNE and PCA
Here are two small repos of mine related to audio classification:
Gender classification from audio
Kiwi / not-a-kiwi bird calls detector
They might give you an idea where to start your project. Check the libraries I am using - likely they will be of help to you.

Anyone knows about text-based emotion detection systems that offer a demo?

I recently finished work on my text-based emotion detection engine and I am looking for other existing working systems to compare with mine in order to know what should be improved and also report comparisons in an upcoming paper.
I have come across many companies claiming to do emotion detection from text but only this one offers a demo that I can use to compare with my system: http://www.o2mc.io/portfolio-posts/text-analysis-restful-api-language-polarity-and-emotion/ (scroll all the way down to see the "try it yourself" section).
Please notice that I am not looking for polarity classification, which is the simpler task of saying if a text is positive or negative. What I am looking for is for emotions (sadness, anger, joy, etc...). Does anyone here know about any company/university/person offering a demo to such system?
As a reference, here is the link to my own system's demo:
http://demo.soulhackerslabs.com/emotion/
Your help is very much appreciated.

define pronunciation starting time for each word in script

I have a text script that is used to create podcasts. So the words in podcast audio are exactly the same as in my text. Now what I want to have is the following:
Word in text | Pronounciation started at
Hello 0:0:0.000
my 0:0:1.125
friends 0:0:2.750
Is that possible to do at all?
Thanks in advance!
One of the key words you could start with to approach the complexity of the problem is "forced alignment". This site also covers questions regarding this topic e.g. here which leads you to questions and answers concerning HTK (the Hidden Markov Model Toolkit) via the releated threads.
You can find a more hands-on style description of how to use forced alignment in automated audio segmentation here.
So the answer is: yes, it is possible, but it is algorithmically very complex and even in its best implementations it is not error-free.
PS.: I found you a really simple tool

read text document from scanned image

Is there any way we can get the text from a scanned document in jpg jpeg or any other format ? I am using ruby as my programming language . But I guess if I can get the texts with some help from other programming languages , it will not be much of a problem to integrate.
Thanks.
Yes, you can use an OCR library. There are additional details at https://stackoverflow.com/questions/1085/free-ocr-library.
In brief, you may wish to consider using tessnet (http://www.pixel-technology.com/freeware/tessnet2/).
This technology is called optical character recognition (OCR).
For programming, check out this question, which recommends tesseract-ocr.
OCR for ruby? check out this question.
If it's just a couple images, here's a site that supposedly does it for free.
OCR Terminal http://www.ocrterminal.com has been the best (most accurate) free tool out of at least a dozen that I have used. It works especially well with formatted (table) data.

Resources