define pronunciation starting time for each word in script - audio

I have a text script that is used to create podcasts. So the words in podcast audio are exactly the same as in my text. Now what I want to have is the following:
Word in text | Pronounciation started at
Hello 0:0:0.000
my 0:0:1.125
friends 0:0:2.750
Is that possible to do at all?
Thanks in advance!

One of the key words you could start with to approach the complexity of the problem is "forced alignment". This site also covers questions regarding this topic e.g. here which leads you to questions and answers concerning HTK (the Hidden Markov Model Toolkit) via the releated threads.
You can find a more hands-on style description of how to use forced alignment in automated audio segmentation here.
So the answer is: yes, it is possible, but it is algorithmically very complex and even in its best implementations it is not error-free.
PS.: I found you a really simple tool

Related

Where can I find list of words like ['Dr.', 'Mrs', 'D.C.', 'Inc.','.com'] that are to be ignored while splitting text by Period(punctuation)?

I have text from huge Text/PDF file. I am working on the text to do sentence tokenization using the Period (punctuation). But, I am facing issues with cases like ['Dr.', 'Mrs', 'D.C.', 'Inc.','.com']. To deal with this, I am looking for complete list of such words. Where can I find corpus of all these prefixes/abbreviations/suffixes?
Thanks.
It would probably be best to use a segmentation library instead of trying to write something yourself. Segmentation involves more than just splitting at a period.
To answer your question though, here is a list of English abbreviations.
This README has some additional info about segmentation and links to various research papers as well as various segmentation libraries.

How to create a custom list of words for predictive text

I am thinking about making an app to do with electrical equipment and I would like the app to recognise the name of tools as you write to save you time (predictive text). How would I go about creating a custom list of words the app recognises. Can anyone please point me in the right direction. I have looked around and not found much information on the subject. I am currently using objective c although learning swift so any of those languages will do. Many thanks.
What you want is not predictive text but rather autocomplete.
See this post, there's an answer by Jano with a link to code that seems to do what you need.

How does OCR work? and how to add OCR to an alphabet

I have an alphabet which has not been tackled before, so when scanned, there's no way to detect the letters for recognition with OCR. I'm trying to program OCR for it, but don't have much experience in this. I'd appreciate some hints as to where to get started, and how such a system is normally implemented.
Take a look at this page--it describes the training process for an open source OCR engine.
The free Stanford Online Machine Learning class has a great set of lessons on Photo OCR in Part XVIII.
This blog post has a brief description of the example taught in the class.
There are some excellent resources at google books. Likewise, if you search for Optical Character Recognition on Amazon, there are some pretty up-to-date books that look to be fairly thick and intellectually challenging :D heh
btw - I'm well aware this post has some age, but you never know when some other person might stumble across this and find just what they need. And if this even has the chance of helping out, then so be it. OCR is such a strange subject, that there's not too much out there that can really really answer the deep-machine ended questions. Especially if you're going to attempt to write your own library. :P

How to compare spoken audio against reference recording - language learning

I am looking for a way to compare a user submitted audio recording against a reference recording for comparison in order to give someone a grade or percentage for language learning.
I realize that this is a very un-scientific way of doing things and is more than a gimmick than anything.
My first thoughts are some sort of audio fingerprinting, or waveform comparison.
Any ideas where I should be looking?
This is by no means a trivial problem to solve, though there is an abundance of research on the topic. Presently the most successful forms of machine learning in the speech recognition domain apply Hidden Markov Model techniques.
You may also want to take a look at existing implementations of HMM algorithms. One such library in its early stages is ghmm.
Perhaps even better and more readily applicable to your problem is HTK.
In addition to chomp's great answer, one important keyword you probably need to look up is Dynamic Time Warping (DTW). This is the wikipedia article: http://en.wikipedia.org/wiki/Dynamic_time_warping

read text document from scanned image

Is there any way we can get the text from a scanned document in jpg jpeg or any other format ? I am using ruby as my programming language . But I guess if I can get the texts with some help from other programming languages , it will not be much of a problem to integrate.
Thanks.
Yes, you can use an OCR library. There are additional details at https://stackoverflow.com/questions/1085/free-ocr-library.
In brief, you may wish to consider using tessnet (http://www.pixel-technology.com/freeware/tessnet2/).
This technology is called optical character recognition (OCR).
For programming, check out this question, which recommends tesseract-ocr.
OCR for ruby? check out this question.
If it's just a couple images, here's a site that supposedly does it for free.
OCR Terminal http://www.ocrterminal.com has been the best (most accurate) free tool out of at least a dozen that I have used. It works especially well with formatted (table) data.

Resources