Draw an image as a waveform? - audio

I want to export a series of images (a movie) and draw as a waveform, like how:
https://www.youtube.com/watch?v=M9xMuPWAZW8&feature=youtu.be&t=328 did
http://oscilloscopemusic.com/ offers a program for loading 3D .obj files but I am currently working in 2D.
I tried opening a .WAV downloaded from http://www.wavtones.com/functiongenerator.php in vim and my terminal program crashed.
I tried .cat Downloads/wavTones.com.unregistred.sin_1000Hz_-6dBFS_3s.wav | pbcopy and pasting into a text editor which showed RIFFæ.
What is an algorithm for converting a series of images into a .wav? Ideally I'd like to make many images and string them together to make a movie like Oscilloscope Music does.
The Aphex Twin is using more than 2 colors and I'm not sure what's going on there.

oscilloscope music and the aphex twin thing are very different.
basically:
Oscilloscope Music:
This is an XY-Plot, also called Parametric Plot or 2D Plot.
Visual Side: The idea here is that you have one point (x/y) and you move that dot around (x and y change with t) so quickly that it appears as a line.
Acoustic Side: You separate the dot into the two axes. The movement of the x coordinate becomes the audio signal of the left channel, the movement of the y coordinate becomes the audio signal of the right channel.
Aphex Twin Spectrogram Trick:
A spectrogram displays how strong which frequency appears. Ie. the audio signal is dissembled into a weighted sum of sin waves of different frequencies. The weight (how much of which frequency) is the color, the y coordinate is the pitch (how high that sin wave is), the x coordinate is the time.
I hope this helps clear things up.

Related

Triangulate camera position and orientation in regards to known objects

I made an object tracker that calculates the position of an object recorded in a live camera feed using stereoscopic cameras. The math was simple, once you know the camera distance and orientation. However, now I thought it would be nice to allow me to quickly extract all these parameters, so when I change my setup or cameras I will be able to quickly calibrate it again.
To calculate the object position I made some simplifications/assumptions, which made the math easier: the cameras are in the same YZ plane, so there is only a distance in x between them. Their tilt is also just in the XY plane.
To reverse the triangulation I thought a test pattern (square) of 4 points of which I know the distances to each other would suffice. Ideally I would like to get the cameras' positions (distances to test pattern and each other), their rotation in X (and maybe Y and Z if applicable/possible), as well as their view angle (to translate pixel position to real world distances - that should be a camera constant, but in case I change cameras, it is quite a bit to define accurately)
I started with the same trigonometric calculations, but always miss parameters. I am wondering if there is an existing solution or a solid approach. If I need to add parameter (like distances, they are easy enough to measure), it's no problem (my calculations didn't give me any simple equations with that possibility though).
I also read about Homography in opencv, but it seems it applies to 2D space only, or not?
Any help is appreciated!

Detecting a specific pattern from a FFT in Arduino

I have an FFT output from a microphone and I want to detect a specific animal's howl from that (it howls in a characteristic frequency spectrum). Is there any way to implement a pattern recognition algorithm in Arduino to do that?
I already have the FFT part of it working with 128 samples #2kHz sampling rate.
lookup audio fingerprinting ... essentially you probe the frequency domain output from the FFT call and take a snapshot of the range of frequencies together with the magnitude of each freq then compare this between known animal signal and unknown signal and output a measurement of those differences.
Naturally this difference will approach zero when unknown signal is your actual known signal
Here is another layer : For better fidelity instead of performing a single FFT of the entire audio available, do many FFT calls each with a subset of the samples ... for each call slide this window of samples further into the audio clip ... lets say your audio clip is 2 seconds yet here you only ever send into your FFT call 200 milliseconds worth of samples this gives you at least 10 such FFT result sets instead of just one had you gulped the entire audio clip ... this gives you the notion of time specificity which is an additional dimension with which to derive a more lush data difference between known and unknown signal ... experiment to see if it helps to slide the window just a tad instead of lining up each window end to end
To be explicit you have a range of frequencies say spread across X axis then along Y axis you have magnitude values for each frequency at different points in time as plucked from your audio clips as you vary your sample window as per above paragraph ... so now you have a two dimensional grid of data points
Again to beef up the confidence intervals you will want to perform all of above across several different audio clips of your known source animal howl against each of your unknown signals so now you have a three dimensional parameter landscape ... as you can see each additional dimension you can muster will give more traction hence more accurate results
Start with easily distinguished known audio against a very different unknown audio ... say a 50 Hz sin curve tone for known audio signal against a 8000 Hz sin wave for the unknown ... then try as your known a single strum of a guitar and use as unknown say a trumpet ... then progress to using actual audio clips
Audacity is an excellent free audio work horse of the industry - it easily plots a WAV file to show its time domain signal or FFT spectrogram ... Sonic Visualiser is also a top shelf tool to use
This is not a simple silver bullet however each layer you add to your solution can give you better results ... it is a process you are crafting not a single dimensional trigger to squeeze.

Active frame extraction from motion capture

I am new to this area - I have a background in a Gait and Posture.
I have a series of motion files of timestamped coordinates (containing X, Y, and Z in mm) with a number of joints (30).
What would be the simplest way to extract the following from the motion observations. 1) The number of active features (i.e. active joints). 2) average speed of motion.
Same file is the format of NxP. Where P is the number of joints and N is the number of frame observations.
What I am looking for is some pointers into possible areas to explore.
Regards,
Dan
A couple of possibilities you might like to explore - both using completely free, (and open source), software:
Python + Numpy/SciPy can easily read in your
coordinate values and calculate the data you require - it is also
possible to plot in 3D using matplotlib.
You could use your positional data to animate a stick figure in
Blender - some of the test blends would provide a good starting point for this.

Distance between the camera and a recognized "object"

I would like to calculate the distance between my camera and a recognized "object".
The recognized "object" is a black rectangle sticker on a white board for example. I know the values of the rectangle (x,y).
Is there a method that I can use to calculate the distance with the values of my original rectangle, and the values of the picture of the rectangle I took with the camera?
I searched the forum for answeres, but none of the were specified to calculate the distance with these attributes.
I am working on a robot called Nao from Aldebaran Robotics, I am planing to use OpenCV to recognize the black rectangle.
If you could compute the angle taken up by the image of the target, then the distance to the target should be proportional to cot (i.e. 1/tan) of that angle. You should find that the number of pixels in the image corresponded roughly to the angles, but I doubt it is completely linear, especially up close.
The behaviour of your camera lens is likely to affect this measurement, so it will depend on your exact setup.
Why not measure the size of the target at several distances, and plot a scatter graph? You could then fit a curve to the data to get a size->distance function for your particular system. If your camera is close to an "ideal" camera, then you should find this graph looks like cot, and you should be able to find your values of a and b to match dist = a * cot (b * width).
If you try this experiment, why not post the answers here, for others to benefit from?
[Edit: a note about 'ideal' cameras]
For a camera image to look 'realistic' to us, the image should approximate projection onto a plane held infront of the eye (because camera images are viewed by us by holding a planar image in front of our eyes). Imagine holding a sheet of tracing paper up in front of your eye, and sketching the objects silhouette on that paper. The second diagram on this page shows sort of what I mean. You might describe a camera which achieves this as an "ideal" camera.
Of course, in real life, cameras don't work via tracing paper, but with lenses. Very complicated lenses. Have a look at the lens diagram on this page. For various reasons which you could spend a lifetime studying, it is very tricky to create a lens which works exactly like the tracing paper example would work under all conditions. Start with this wiki page and read on if you want to know more.
So you are unlikely to be able to compute an exact relationship between pixel length and distance: you should measure it and fit a curve.
It is a big topic. If you want to proceed from a single image, take a look at this old paper by A. Criminisi. For an in-depth view, read his Ph.D. thesis. Then start playing with the OpenCV routines in the "projective geometry" sectiop.
I have been working on Image/Object Recognition as well. I just released a python programmed android app (ported to android) that recognizes objects, people, cars, books, logos, trees, flowers... anything:) It also shows it's thought process as it "thinks" :)
I've put it out as a test for 99 cents on google play.
Here's the link if you're interested, there's also a video of it in action:
https://play.google.com/store/apps/details?id=com.davecote.androideyes
Enjoy!
:)

Do I need to rectify if camera planes are aligned?

If I am taking images from a pair of cameras whose principle axis(in both the cameras) is perpendicular to the baseline do I need to rectify the images?Typical example would be bumblebee stereo cameras.
If you can also guarantee that:
the camera axes are parallel (maybe so if bought as a single package like the bumblebee)
you have no lens distortion (probably not)
all the other internal camera parameters are identical
your measurement axis is parallel to your baseline
then you might be able to skip image rectification. Personally I wouldn't.
Just think about lens distortion. Even assuming everything else is equal and aligned, this might mess things up. Suppose a feature appears on the edge in one image and a the centre of the other. At the edge it might be distorted a few pixels away, while at the centre it appears where it should. Without rectification, your stereoscopic calculation (which assumes straight lines from object to sensor) is going to give you bad results.
Depends what you mean by "rectify". In stereo vision, it is common to ensure that the epipolar lines are aligned too. That means the i-th row in image 1 corresponds to the i-th row in image 2. An optional step is to reduce distortion caused by the rectification process.
If you are taking images from a pair of cameras whose principle axis is perpendicular to the baseline, then you have epipoles mapped on infinity (parallel epipolar lines in the same image). You need another transform to align the epipolar lines in both images. You will find this transform in Loop & Zhang's paper, also the transform to reduce distortion.
And be careful about lens distortion (see wxffles' answer).

Resources