ARCore with additional object recognition - conv-neural-network

I know, the object recognition feature is currently not supported by Google's ARCore.
My simple goal: detect cups and show some coffee inside. (Best would be display it live on the phone)
Is there really no way to detect objects?
Do you know any additional computations approaches, which can recognize some objects via ARCore?
Train a CNN. Instead of training image + annotation, use the point cloud + annotation. Is this approach viable?
Any approach, to record the a video + point cloud and compute them on a backend?
Is Snapchat using ARCore?
Are they detecting the face and pose to put the virtual makeup on the mesh?
How is the mesh computed?
I don't expect answers to every question, just ideas.
Maybe, someone knows simular projects, interesting links or something to think about.
Thanks in advance.

Related

Tutorial tensorflow audio pitch analysis

I'm a beginner with tensorflow and Python and I'm trying to build an app that automatically detects, in a football (soccer) match some key moments (yellow/red cards, goals, etc).
I'm starting to understand how to do a video analysis training the program on a dataset built by me, downloading images from the web and tagging them. In order to obtain some better results for the analysis, I was wondering if someone had some suggestions on tutorials to follow in order to understand how to train my app also on audio files, to make the program able to understand when there is a pitch variation in the audio of the video and combine both video and audio analysis in order to get better results.
Thank you in advance
Since you are new to Python and to tensorflow, I recommend you focus on just audio for now, especially since its a strong indicator of events of importance in a football match (red/yellow cards, nasty fouls, goals, strong chances, good plays, etc).
Very simply, without using much ML at all, you can use the average volume of a time period to infer significance. If you want to get a little more sophisticated, you can consider speech-to-text libraries to look for keywords in commentator speech.
Using video to try to determine when something important is happening is much, much more challenging.
This page can help you get started with audio signal processing in Python.
https://bastibe.de/2012-11-02-real-time-signal-processing-in-python.html

Tracking the top of heads with Kinect

I was wondering if there was an existing API for tracking the top of people heads with the Kinect. e.g., the Kinect is facing downwards from a ceiling.
If not, how might I implement such a thing with its depth data.
No. The Kinect expects to be facing a standing (or seated, given the appropriate flag) human. All APIs (official or 3rd party) that have a notion of skeleton tracking expect this.
If you wish you track someone from above, you will need to use a library such as OpenCV (or EmguCV, for C# development). Well, you don't have to, but they offer utilities to help with computer vision and image processing. These libraries don't care if you are using a Kinect or just a regular RGB camera.
Using the Kinect from above, you could use the depth data to help locate and track blobs. With the Kinect at a known distance from the floor, have a few people walk under it and see what z-coordinates you get out of it -- you can then assume that anything within a certain z-coordinate range is a person walking across the screen (vs. a cat, or something else).
You will need to use standard image processing techniques (see OpenCV reference above) to initially find the blobs within the image. Once found, the depth data from the Kinect might be useful but I think you'll find it isn't ultimately necessary if you're just watching people walk across the floor.
We built a Kinect-driven experience where the sensors had to point downward to detect users walking along a wall. We used openTSPS to do all the work of taking the camera input and doing blob detection and handing off tracked "persons" to (in our case) a Processing app. It works really well for us.
http://opentsps.com/

Is it possible for a system to identify hand signs using just the Haar training in OpenCV?

I am doing a project on hand sign recognition on a static image. Can I use just Haar training to accomplish this?
As what I've understood, it is somewhat similar to the concept of neural networks.
Using Haar training maybe help to detect the hand, but not for recognize.
The people use many approaches, so I cannot give a unique. You could make some research using Google Scholar and use the keywords "hand sign", "recognition" and "detection".
Some tips: you need to segment the hand and use some template matching or other method to recognize the format. There is also a project for hand gestures here.

Download Google Earth "Gray Buildings" models

I need to work with the 3d model of some places. Google Earth has the 3d building layer with "Gray Buildings" in it. This would be exactly what I would require. Is there any way to get the 3d models that are used? Is there a Google Earth API (other than the Javascript stuff)? (I'm working in .net) that would help?
Or is there at least a manual solution how I can get these models, say, into Sketchup?
Thanks a lot!
While there still isn't support for getting building geometry from Google's APIs, OpenStreetMaps does expose some data you can use. Check out this guide here:
http://wiki.flightgear.org/OpenStreetMap_buildings
Making a request like
http://overpass-api.de/api/xapi?way[bbox=-74.02037,40.69704,-73.96922,40.73971][building=*][#meta]
Will return an XML with building's base outlines and (in some cases) heights. You can use this info to extrude some very simple buildings: http://i.imgur.com/ayNPB.png
To fill in the missing height values (and they're missing on most buildings), I try to use the area of the building's footprint to determine how tall it might be compared to nearby buildings. Unfortunately, until Google is able to make their models public, this will have to do.
There is currently no way to download models from within Google Earth. Also, even is there was - extracting data is against the TOS. Many of the models come from government or private sources so there are issues with licencing the data as a whole. It is worth noting however that a lot of the models in Google Earth are located on the Sketch up 3dwarehouse so maybe you could get that data you want from there?
Also, to work with the javascript api from managed code you might want to check this control library I have put together. Whilst the controls themselves may not be applicable, the ideas behind them should get you under way. http://code.google.com/p/winforms-geplugin-control-library/ essentially there are a series of wrappers and helpers that let you seamlessly integrate the plugin into a winforms application.
You can also read more about Cities in 3d (the name of the project that developed the low-res building layer) here: http://sketchup.google.com/3dwh/citiesin3d/

Visual Similarity Algorithms (for CBIR)

I am trying to build a collection of Visual similarity between images, being size, angle, color, rotation invariant for Content Based Image Retrieval.Quite agnostic about the platform but .NET, Java or Python are preferred. But if others are available, pls. suggest away.
I am quite familiar with OpenCV Match template and Match shapes. I have also looked at AForge.NET.
p.s. Something similar to http://www.imgseek.net/home would be ideal:
Content Based Image Retrieval is a field of heavy research. Unfortunately it is not my field of research, therefore I am unable to give you an authoritative suggestion on a viable algorithm. But I can give you the website where I would start looking for an answer:
CVPapers - Computer Vision Resource (check out the Computer Vision Paper Indexes)

Resources