API to retrieve images from within an image or pdf - node.js

I am looking for a way to extract images from within another image. For example:
Here is a picture taken of a paper. It includes text, an image of a camera, and an image of a qr code. Is there an API that can possibly extract those two(camera and qr code) from this larger image and separate them into their own individual images. I know this is doable with the text(OCR), but I need to find some way to do Image Recognition if that even exists. For now, I cant find any reference to doing this besides extracting images from pdf's, which none of those softwares have the capability to extract them from a non-perfect pdf.
Price for the API(node.js prefered, but i can adapt to use any language) is not a big concern, I'm just not sure this is even possible to due without programming a legitable artificial intelligence using machine learning, which I would no doubt cause a global internet shutdown from breaking everything if I attempted to do so.
Anyway, any suggestions would be great and much appreciated. Thanks!
EDIT: the images aren't always those, it can be an image of anything, from potatoes to flags

For the QR code, you can simply use a QR code scanner library and convert the output back into a QR code. As for the camera, you are going to need an image recognition service like Google Cloud Vision or train your own neural network with something like TensorFlow to recognize pictures of cameras.

QR detectors abound around the web and some are on github but for single objects you could try hotpot API https://hotpot.ai/docs/api
your code example linked into https://hotpot.ai/remove-background
for striping back you may need a secondary autocrop task

Related

How to detect handwriting using Google Cloud Vision API

TL;DR: how can I detect the presence of handwriting in an image?
I'm using Google's Python Vision API to scan for text in images, with generally good results. Most of the time the images contain printed text, but sometimes there is handwriting.
As noted in the documentation, you sometimes get better results for handwritten text using document_text_detection rather than the standard text_detection API call. My own tests back this up, but also show that the standard text_detection call generally works best for printed text in JPEG images.
So I'd like to use the standard text_detection by default, and only run images thrrough document_text_detection if there is handwriting. However, I can't find a reliable way to detect the presence of handwritten text in an image using the Vision APIs.
I tried label detection, but there does not appear to be a specific label for handwriting. Occasionally it will spit out "Calligraphy" but not reliably.
Does anyone know of a way to accomplish this?
I haven’t used Google Cloud Vision API but you can try Object detection models. I would suggest to create a labeled dataset over the document images of your use case using tools like LabelImg and train an Object detection model like Yolov3 [paper] [code]. I have worked on similar problems It should work.

find an altered image from the original image data set

Here is my problem:
I must match two images. One image from the project folder and this folder have over 20.000 images. The other one is from a camera.
What I have done?
I can compare images with basic OpenCV example codes that I found in the documentation. OpenCV Doc I can also compare and find an image by using the hash of my image data set. It is so fast and it is only suitable for 2 exact images. One for query the other one is the target. But they are the same exact image.
So, I need something as reliable as feature matching and as fast as hash methods. But I can't use machine learning or anything on that level. It should be basic. Plus, I'm new to these stuff. So, my term project is on risk.
Example scenario:
If I ever take a picture of an image in my image data set from my computer's screen. This would change many features of the original image. In the case of defining what's in that image, a human won't struggle much but a comparison algorithm will struggle. Such a case leaves lot's of basic comparison algorithm out of the game. But, a machine-learning algorithm could solve the problem but it's forbidden to use in my project.
Needs:
It must be fast.
It must be accurate.
It must be easy to understand.
Any help is okay. A piece of code, maybe an article or a tutorial. Even an advice or a topic title might be really helpful to me.
Once saw this camera model identification challenge on kaggle. This notebook discusses about noise pattern changes with changing devices. May be you should look in to this and other notebooks in that challenge. Thanks!

Cardboard QR Decode

tl;dr: Is there a way to get Cardboard's calibration data without parsing through Google's protocol buffers?
I need to access Cardboard viewer's lens data, coefficients etc. to do proper undistortion calculation.
I contacted two Cardboard viewer manufactures and both had no idea what the values are and pointed me to contact google, since they used googles calibration.
As discribed here you can decode the QR code by c++ parsing it through Google's protocol buffers, but I am currently not in a c++ dev enviroment and crunching through the doc to get the manufacturers calibration is very time consuming for just a bunch of coefficients. Is there a better way?
Someone build a Webpage (https://lisa-wolfgang.github.io/vrEmbed/tools/google_profile_decode.html) that decodes the Google Cardboard Links into JSON using googles javascript protocol buffer library. (If you use the short URL, leave out the http://)
I used it to get out the data for a project.
The accepted answer didnt work for me so I scanned the QR code with a reader, got a short goo.gl URL.
I ran this through https://www.expandurl.net/ and got a longer link that just points to the app store to download Cardboard app like this:
http://google.com/cardboard/app?p=ALONGSTRINGOFCHARACTERS
I just needed this to set my Unity project's Cardboard profile automatically so that the QR code doesnt have to be scanned on every device:
GvrCardboardHelpers.SetViewerProfile("http://google.com/cardboard/cfg?p=ALONGSTRINGOFCHARACTERS");
You can Base64 decode that long string of characters (the 'p' parameter) to get the coefficients, but it's encoded in a binary format.
Reverse engineering the Cardboard profile generator (https://vr.google.com/cardboard/viewerprofilegenerator/) may help decode exactly which variables are what, but unfortunately it seems broken currently.

Identify Photos vs. Graphics

Is there a way to write a program that can identify photos vs. graphic images in a folder of jpg files?
sample photo
http://dansdemos.info/clips/samples/photo.jpg
sample graphic image
http://dansdemos.info/clips/samples/graphic.jpg
I was thinking ImageMagic compare could do it, if it was provided a set of samples it could use to calculate differences between images. I was thinking coming up with that might be tricky, so I was hoping there might be a simpler approach. Maybe something as simple as a Google search I have not thought of. Any help or comment would be much appreciated. Thank you.
You could have the program check for Exif data of various kinds, like the camera manufacturer.
It wouldn't be foolproof, but it would probably work in most cases, as long as the Exif data hasn't been removed from photos by some postprocessing step.
It worked just fine on your two sample images. E.g. your photo of guinea fowl has Exif data for camera maker, camera model, f-stop, exposure, etc. The graphic.jpg appears to have none of those.

Download Google Earth "Gray Buildings" models

I need to work with the 3d model of some places. Google Earth has the 3d building layer with "Gray Buildings" in it. This would be exactly what I would require. Is there any way to get the 3d models that are used? Is there a Google Earth API (other than the Javascript stuff)? (I'm working in .net) that would help?
Or is there at least a manual solution how I can get these models, say, into Sketchup?
Thanks a lot!
While there still isn't support for getting building geometry from Google's APIs, OpenStreetMaps does expose some data you can use. Check out this guide here:
http://wiki.flightgear.org/OpenStreetMap_buildings
Making a request like
http://overpass-api.de/api/xapi?way[bbox=-74.02037,40.69704,-73.96922,40.73971][building=*][#meta]
Will return an XML with building's base outlines and (in some cases) heights. You can use this info to extrude some very simple buildings: http://i.imgur.com/ayNPB.png
To fill in the missing height values (and they're missing on most buildings), I try to use the area of the building's footprint to determine how tall it might be compared to nearby buildings. Unfortunately, until Google is able to make their models public, this will have to do.
There is currently no way to download models from within Google Earth. Also, even is there was - extracting data is against the TOS. Many of the models come from government or private sources so there are issues with licencing the data as a whole. It is worth noting however that a lot of the models in Google Earth are located on the Sketch up 3dwarehouse so maybe you could get that data you want from there?
Also, to work with the javascript api from managed code you might want to check this control library I have put together. Whilst the controls themselves may not be applicable, the ideas behind them should get you under way. http://code.google.com/p/winforms-geplugin-control-library/ essentially there are a series of wrappers and helpers that let you seamlessly integrate the plugin into a winforms application.
You can also read more about Cities in 3d (the name of the project that developed the low-res building layer) here: http://sketchup.google.com/3dwh/citiesin3d/

Resources