Are Tango Project sensors ideal to identify body parts (like legs) in indoor environments? - sensors

I'm interested to develop an AR app using a Tango Project cellphone that would be able to identify body parts, like legs, and project some 3D objects on the identified parts. My concern now is if the sensors in Project Tango cellphones are ideal for this and what features I should be starting with. I appreciate any guide.
Thanks in advance

There are some research papers on human body segmentation / skeleton tracking using RGB image + depth map, some only use the RGB image.
I recommend you this tutorial on Human Body Recognion and Tracking which includes the following interesting reference:
Real-Time Human Pose Recognition in Parts from a Single Depth Image,
J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore,
A. Kipman, and A. Blake, Proc. IEEE Conf. Computer Vision and Pattern
Recognition, 2011
These works were performs with Kinect-like sensors, which have a higher resolution depth map than the Google Tango devices, so unfortunately I cannot tell for certain the results will be as good with a Google Tango device.

Related

Tracking the top of heads with Kinect

I was wondering if there was an existing API for tracking the top of people heads with the Kinect. e.g., the Kinect is facing downwards from a ceiling.
If not, how might I implement such a thing with its depth data.
No. The Kinect expects to be facing a standing (or seated, given the appropriate flag) human. All APIs (official or 3rd party) that have a notion of skeleton tracking expect this.
If you wish you track someone from above, you will need to use a library such as OpenCV (or EmguCV, for C# development). Well, you don't have to, but they offer utilities to help with computer vision and image processing. These libraries don't care if you are using a Kinect or just a regular RGB camera.
Using the Kinect from above, you could use the depth data to help locate and track blobs. With the Kinect at a known distance from the floor, have a few people walk under it and see what z-coordinates you get out of it -- you can then assume that anything within a certain z-coordinate range is a person walking across the screen (vs. a cat, or something else).
You will need to use standard image processing techniques (see OpenCV reference above) to initially find the blobs within the image. Once found, the depth data from the Kinect might be useful but I think you'll find it isn't ultimately necessary if you're just watching people walk across the floor.
We built a Kinect-driven experience where the sensors had to point downward to detect users walking along a wall. We used openTSPS to do all the work of taking the camera input and doing blob detection and handing off tracked "persons" to (in our case) a Processing app. It works really well for us.
http://opentsps.com/

Audio content analysis for online audiovisual data

I want to work on a project where I have to segment and classify online audiovisual data based on its audio content, i.e. different parts of the audio visual data will be segmented and classified as silence, music, speech, speech+background music, etc based on their audio content.
I am aware that I have to obtain the audio part from the audiovisual data and extract features like zero crossing, spectral peaks, etc. and find out segment boundaries in order to segment audio data.
But I'm lost in the beginning itself.
I do not know how to start off with the project. The output of the software are segments of audiovisual data under different categories like silence, speech, music, etc.
It will be really helpful if someone lets me know
Which programming language is convenient for this purpose?
What steps should i follow in order to develop this software?
I have no background in digital signal processing. It'll be really helpful if I get some guidance
I'd suggest to look into a multimedia framework such as GStreamer. It is crossplatform, but the easiest to get started on Linux where it originates from. It already comes with all kind of plugins to receve, demux and decode audio and video. It also has a couple of analyzers (such as level and spectrum analyzers for audio as well as voice activity detection). Those could be a good starting point for your experiments. Gstreamer itself is written in C, but applications can use the language bindings to python, perl, c#, c++, java, ...

Basics of Digital Audio

I have recently started going through sound card drivers in Linux[ALSA].
Can a link or reference be suggested where I can get good basics of Audio like :
Sampling rate,bit size etc.
I want to know exactly how samples are stored in Audio files on a computer and reverse of this which is how samples(numbers) are played back.
The Audacity tutorial is a good place to start. Another introduction that covers similar ground. The PureData tutorial at flossmanuals is also a good starting point. Wikipedia is a good source once you have the basics down.
Audio is input into a computer via an analog-to-digital converter (ADC). Digital audio is output via a digital-to-analog converter (DAC).
Sample rate is the number of times per second at which the analog signal is measured and stored digitally. You can think of the sample rate as the time resolution of an audio signal. Bit size is the number of bits used to store each sample. You can think of it as analogous to the color depth of an image pixel.
David Cottle's SuperCollider book also has a great introduction to digital audio.
I was in the same situation, and certainly this kind of information is out there but you need to do some research first. This is what I have found:
Digital Audio processing is a branch of DSP (Digital Signal
Processing).
DSP is one of the most powerful technologies that will
shape science and engineering in the twenty-first century.
Revolutionary changes have already been made in a broad range of
fields: communications, medical imaging, radar & sonar, high fidelity
music reproduction, and oil prospecting, to name just a few. Each of
these areas has developed a deep DSP technology, with its own
algorithms, mathematics, and specialized techniques…
This quote was taken from a very helpful guide that covers every topic in depth called the “The Scientist and Engineer's Guide to
Digital Signal Processing”. And though you are not asking for DSP specifically there’s a chapter that covers all digital audio related topics with a very good explanation.
You can find it in the chapter 22 - Audio Processing, and covers all this topics:
Human Hearing: how the sound is perceived by our ears, this is the
basis of how then the sound is then generated artificially.
Timbre: explains the properties of sound, like loudness, pitch and
timbre.
Sound Quality vs. Data Rate: once you know the previous concepts
we start to translate it to the electronic side.
High Fidelity Audio: gives you a picture of how sound is then
processed digitally.
Companding: here you can find how sound is then processed and
compressed for telecommunications.
Speech Synthesis and Recognition: More processes applied to the
sound, like filters, synthesis, etc.
Nonlinear Audio Processing: this is more advanced but understandable,
for sound treatment and other topics.
It explains the basics of sound in the real world, in case you might want to take a look, and then it explains how the sound is processed in the computer including what you are asking for.
But there are other topics that can be found in wikipedia that are more specific, let’s say the “Digital audio” page that explains every detail of this topic, this site can be used as a reference for further research, just in the beginning you can find a few links to sample rate, sound waves, digital forms, standards, bit depth, telecommunications, etc. There are a few things you might need to study more, like the nyquist-shannon theorem, fourier transforms, complex numbers and so on, but this is only used in very specific and advanced topics that you might not review or use. But I mention it just in case you are interested. You can find information in both the DSP guide book and wikipedia although you need to study some math.
I’ve been using python to develop and study these subjects with code since it has a lot of useful libraries, like numpy, sound device, scipy, etc. And then you can start plating with sound. On youtube you can find lots of videos that also guide you on how to do this. I’ve found synthesis, filters, voice recognition, you can create wav files with just code, which is great. But also I’ve seen projects in C/C++, Javascript, and other languages, so it might help you to keep learning and coding fun things.
There are a few other references across the internet but you might need to know what you are looking for, this book and the wikipedia page would be the best starting points for me, since it gives you the basics and explains in depth every topic. Then depending on the goal you want to achieve you can then start looking for more information.

Visual Similarity Algorithms (for CBIR)

I am trying to build a collection of Visual similarity between images, being size, angle, color, rotation invariant for Content Based Image Retrieval.Quite agnostic about the platform but .NET, Java or Python are preferred. But if others are available, pls. suggest away.
I am quite familiar with OpenCV Match template and Match shapes. I have also looked at AForge.NET.
p.s. Something similar to http://www.imgseek.net/home would be ideal:
Content Based Image Retrieval is a field of heavy research. Unfortunately it is not my field of research, therefore I am unable to give you an authoritative suggestion on a viable algorithm. But I can give you the website where I would start looking for an answer:
CVPapers - Computer Vision Resource (check out the Computer Vision Paper Indexes)

How Scanline based 2d rendering engines works?

Will you please provide me a reference to help me understand how scanline based rendering engines works?
I want to implement a 2D rendering engine which can support region-based clipping, basic shape drawing and filling with anti aliasing, and basic transformations (Perspective, Rotation, Scaling). I need algorithms which give priority to performance rather than quality because I want to implement it for embedded systems with no fpu.
I'm probably showing my age, but I still love my copy of Foley, Feiner, van Dam, and Hughes (The White Book).
Jim Blinn had a great column that's available as a book called Jim Blinn's Corner: A Trip Down the Graphics Pipeline.
Both of these are quited dated now, and aside from the principles of 3D geometry, they're not very useful for programming today's powerful pixel pushers.
OTOH, they're probably just perfect for an embedded environment with no GPU or FPU!
Here is a good series of articles by Chris Hecker that covers software rasterization:
http://chrishecker.com/Miscellaneous_Technical_Articles
And here is a site that talks about and includes code for a software rasterizer. It was written for a system that does not have an FPU (the GP2X) and includes source for a fixed point math library.
http://www.trenki.net
I'm not sure about the rest, but I can help you with fast scaling and 2D rotation for ARM (written in assembly language). Check out a demo:
http://www.modaco.com/content/smartphone-software-games/291993/bbgfx-2d-graphics-library-beta/
L.B.

Resources