What is the difference between OCR and Recognize Text in Azure Pricing Page - azure

As I found in https://azure.microsoft.com/en-us/pricing/details/cognitive-services/computer-vision/, OCR has different price against Recognize Text. It is quite confusing. What is the difference? I can't find any clue thru the documents.

The difference is described here in the docs: https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/concept-recognizing-text#ocr-optical-character-recognition-api
In a few words:
OCR is synchronous, uses an earlier recognition model but works with more languages
Recognize Text (and Read API, its successor) uses updated recognition models, but is asynchronous.
If you want to process handwritten text for example, you should use the 2nd one

Related

How to detect handwriting using Google Cloud Vision API

TL;DR: how can I detect the presence of handwriting in an image?
I'm using Google's Python Vision API to scan for text in images, with generally good results. Most of the time the images contain printed text, but sometimes there is handwriting.
As noted in the documentation, you sometimes get better results for handwritten text using document_text_detection rather than the standard text_detection API call. My own tests back this up, but also show that the standard text_detection call generally works best for printed text in JPEG images.
So I'd like to use the standard text_detection by default, and only run images thrrough document_text_detection if there is handwriting. However, I can't find a reliable way to detect the presence of handwritten text in an image using the Vision APIs.
I tried label detection, but there does not appear to be a specific label for handwriting. Occasionally it will spit out "Calligraphy" but not reliably.
Does anyone know of a way to accomplish this?
I haven’t used Google Cloud Vision API but you can try Object detection models. I would suggest to create a labeled dataset over the document images of your use case using tools like LabelImg and train an Object detection model like Yolov3 [paper] [code]. I have worked on similar problems It should work.

Microsoft Speech Recognition defaults vs API

So I've been using Microsoft Speech Recognition in Windows 10, doing the training exercises, dictating text into Wordpad and correcting it, adding words to the dictionary and so on. I would like to use the software to transcribe .wav files. It appears one can do this using the Windows Speech Recognition API, but this seems to involve creating and loading one's own grammar files, which suggests to me that this would basically create a new speech recognizer, that uses the same building blocks but is a different program from the one that runs when I click "Start Speech Recognition" in the start menu. In particular, it would perform differently because of differences in training or configuration.
Am I wrong in this ? And if I'm not, is there still a way of retrieving all the data the default speech recognizer uses so I can reproduce its behavior exactly? If I need to create a separate speech recognizer with its own grammar files and separate training history and so on in order to transcribe .wav files then so be it but I'd like to better understand what's going on here.
The Woundify open source project contains examples of how to convert wav files and to text (STT).

Search in a book with speech

I am trying to build a program that will find which page/sentence in a book is read to microphone. I have the book's text and its audio content. The user will start reading from a random page and program is supposed to synch to the user and show the section of the book which is being read. It might seem useless program but please bear with me..
Would an approach similar to shazam-like programs work? I am not sure how effective those algorithms for speech. Also, the speaker will be different and might have accent and different speeds to read.
Another approach would be converting the speech to text and searching the text in the book. The problem is that the language of the book is a rare one for which there is no language model available. In addition, the script does not use latin characters which makes programming difficult (for me at least).
Is there any solutions that anyone can recommend? Would extracting features from the audio file and comparing with the "real-time" extracted features (from microphone) would work? Which features?
Any implementation/code that I can start with? Any language is ok but prefer C.
You need to use speech recognizer.
Create a language model directly from the book text. That will make the recognition of the book reading very accurate, both original reading and the reading by the user.
Use this language model to recognize the book and assign timestamps for the words or use more advanced algorithm to perform text to audio alignment.
Recognize user's speech with the book-specific language model and use the recognized text to display a position in a book.
You can use CMUSphinx for the mentioned tasks.

read text document from scanned image

Is there any way we can get the text from a scanned document in jpg jpeg or any other format ? I am using ruby as my programming language . But I guess if I can get the texts with some help from other programming languages , it will not be much of a problem to integrate.
Thanks.
Yes, you can use an OCR library. There are additional details at https://stackoverflow.com/questions/1085/free-ocr-library.
In brief, you may wish to consider using tessnet (http://www.pixel-technology.com/freeware/tessnet2/).
This technology is called optical character recognition (OCR).
For programming, check out this question, which recommends tesseract-ocr.
OCR for ruby? check out this question.
If it's just a couple images, here's a site that supposedly does it for free.
OCR Terminal http://www.ocrterminal.com has been the best (most accurate) free tool out of at least a dozen that I have used. It works especially well with formatted (table) data.

Any interesting OCR/NLP related projects for CS final year project?

I am a final year CS student, and very interested about OCR and NLP stuffs.
The problem is I don't know anything about OCR yet and my project duration is only for 5 months. I would like to know OCR & NLP stuff that is viable for my project?
Is writing a (simple) OCR engine for a single language too hard for my project? What about adding a language support for existing FOSS OCR softwares?
My background is in the commercial side of OCR and in my experience writing anything but a simple OCR engine would take a fair amout of time. To get even reasonable results your input files would have to contain very clean text characters for the purposes of OCR or you would need lots of marked up training data to train the engine. This would limit your input data available using OCR to high quality printed documents and computer generated documents such as exporting a Word document to a TIFF image. Commercial OCR engines do a much better job reading standard scanned invoices and letters than even Tesseract OCR and they still make mistakes.
You could write a simple OCR engine and use NLP and language analysis to show how it can improve the OCR results. Most of the OCR engines are doing this anyway but it could be an interesting project. The commercial engines have had years of fine tuning to improve their recognition accuracy and they use every trick they can think of.
This article may give you some ideas on one way how to write an OCR engine:
http://www.codeproject.com/KB/dotnet/simple_ocr.aspx
You may be able to contribute to the Tesseract project but you would first need to research what has already been included and what is not and if anyone else is working on the same problem.

Resources