How to detect handwriting using Google Cloud Vision API - python-3.x

TL;DR: how can I detect the presence of handwriting in an image?
I'm using Google's Python Vision API to scan for text in images, with generally good results. Most of the time the images contain printed text, but sometimes there is handwriting.
As noted in the documentation, you sometimes get better results for handwritten text using document_text_detection rather than the standard text_detection API call. My own tests back this up, but also show that the standard text_detection call generally works best for printed text in JPEG images.
So I'd like to use the standard text_detection by default, and only run images thrrough document_text_detection if there is handwriting. However, I can't find a reliable way to detect the presence of handwritten text in an image using the Vision APIs.
I tried label detection, but there does not appear to be a specific label for handwriting. Occasionally it will spit out "Calligraphy" but not reliably.
Does anyone know of a way to accomplish this?

I haven’t used Google Cloud Vision API but you can try Object detection models. I would suggest to create a labeled dataset over the document images of your use case using tools like LabelImg and train an Object detection model like Yolov3 [paper] [code]. I have worked on similar problems It should work.

Related

API to retrieve images from within an image or pdf

I am looking for a way to extract images from within another image. For example:
Here is a picture taken of a paper. It includes text, an image of a camera, and an image of a qr code. Is there an API that can possibly extract those two(camera and qr code) from this larger image and separate them into their own individual images. I know this is doable with the text(OCR), but I need to find some way to do Image Recognition if that even exists. For now, I cant find any reference to doing this besides extracting images from pdf's, which none of those softwares have the capability to extract them from a non-perfect pdf.
Price for the API(node.js prefered, but i can adapt to use any language) is not a big concern, I'm just not sure this is even possible to due without programming a legitable artificial intelligence using machine learning, which I would no doubt cause a global internet shutdown from breaking everything if I attempted to do so.
Anyway, any suggestions would be great and much appreciated. Thanks!
EDIT: the images aren't always those, it can be an image of anything, from potatoes to flags
For the QR code, you can simply use a QR code scanner library and convert the output back into a QR code. As for the camera, you are going to need an image recognition service like Google Cloud Vision or train your own neural network with something like TensorFlow to recognize pictures of cameras.
QR detectors abound around the web and some are on github but for single objects you could try hotpot API https://hotpot.ai/docs/api
your code example linked into https://hotpot.ai/remove-background
for striping back you may need a secondary autocrop task

What is the difference between OCR and Recognize Text in Azure Pricing Page

As I found in https://azure.microsoft.com/en-us/pricing/details/cognitive-services/computer-vision/, OCR has different price against Recognize Text. It is quite confusing. What is the difference? I can't find any clue thru the documents.
The difference is described here in the docs: https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/concept-recognizing-text#ocr-optical-character-recognition-api
In a few words:
OCR is synchronous, uses an earlier recognition model but works with more languages
Recognize Text (and Read API, its successor) uses updated recognition models, but is asynchronous.
If you want to process handwritten text for example, you should use the 2nd one

ARCore with additional object recognition

I know, the object recognition feature is currently not supported by Google's ARCore.
My simple goal: detect cups and show some coffee inside. (Best would be display it live on the phone)
Is there really no way to detect objects?
Do you know any additional computations approaches, which can recognize some objects via ARCore?
Train a CNN. Instead of training image + annotation, use the point cloud + annotation. Is this approach viable?
Any approach, to record the a video + point cloud and compute them on a backend?
Is Snapchat using ARCore?
Are they detecting the face and pose to put the virtual makeup on the mesh?
How is the mesh computed?
I don't expect answers to every question, just ideas.
Maybe, someone knows simular projects, interesting links or something to think about.
Thanks in advance.

Microsoft Speech Recognition defaults vs API

So I've been using Microsoft Speech Recognition in Windows 10, doing the training exercises, dictating text into Wordpad and correcting it, adding words to the dictionary and so on. I would like to use the software to transcribe .wav files. It appears one can do this using the Windows Speech Recognition API, but this seems to involve creating and loading one's own grammar files, which suggests to me that this would basically create a new speech recognizer, that uses the same building blocks but is a different program from the one that runs when I click "Start Speech Recognition" in the start menu. In particular, it would perform differently because of differences in training or configuration.
Am I wrong in this ? And if I'm not, is there still a way of retrieving all the data the default speech recognizer uses so I can reproduce its behavior exactly? If I need to create a separate speech recognizer with its own grammar files and separate training history and so on in order to transcribe .wav files then so be it but I'd like to better understand what's going on here.
The Woundify open source project contains examples of how to convert wav files and to text (STT).

Download Google Earth "Gray Buildings" models

I need to work with the 3d model of some places. Google Earth has the 3d building layer with "Gray Buildings" in it. This would be exactly what I would require. Is there any way to get the 3d models that are used? Is there a Google Earth API (other than the Javascript stuff)? (I'm working in .net) that would help?
Or is there at least a manual solution how I can get these models, say, into Sketchup?
Thanks a lot!
While there still isn't support for getting building geometry from Google's APIs, OpenStreetMaps does expose some data you can use. Check out this guide here:
http://wiki.flightgear.org/OpenStreetMap_buildings
Making a request like
http://overpass-api.de/api/xapi?way[bbox=-74.02037,40.69704,-73.96922,40.73971][building=*][#meta]
Will return an XML with building's base outlines and (in some cases) heights. You can use this info to extrude some very simple buildings: http://i.imgur.com/ayNPB.png
To fill in the missing height values (and they're missing on most buildings), I try to use the area of the building's footprint to determine how tall it might be compared to nearby buildings. Unfortunately, until Google is able to make their models public, this will have to do.
There is currently no way to download models from within Google Earth. Also, even is there was - extracting data is against the TOS. Many of the models come from government or private sources so there are issues with licencing the data as a whole. It is worth noting however that a lot of the models in Google Earth are located on the Sketch up 3dwarehouse so maybe you could get that data you want from there?
Also, to work with the javascript api from managed code you might want to check this control library I have put together. Whilst the controls themselves may not be applicable, the ideas behind them should get you under way. http://code.google.com/p/winforms-geplugin-control-library/ essentially there are a series of wrappers and helpers that let you seamlessly integrate the plugin into a winforms application.
You can also read more about Cities in 3d (the name of the project that developed the low-res building layer) here: http://sketchup.google.com/3dwh/citiesin3d/

Resources