Web Audio API - how to use AudioPannerNode for regular LR panning - audio

AudioPannerNode is a processing node which positions / spatializes an incoming audio stream in three-dimensional space. Is there a way to use it for regular LR panning, having in mind it uses a 3D cartesian coordinate system in conjunction with a listener whose position and orientation is used together with the panner's position and orientation to determine how the audio will be spatialized? Or I should just skip the AudioPannerNode, and try to achieve this with LR gain nodes?

LR panning is pretty straightforward, and you can use either of the approaches you describe. I would err on using the spatialized model (grounded in a physics engine), since otherwise you need to tweak your own balance.
Even though you're not doing 3D, you can project down to 2D, and play with front/rear balance as well, rather than going straight to 1D.
More information about spatialized audio in the Web Audio API can be found in the Web Audio for Games article.

Related

Scaling an image according to audio (threshold, frequencies)

I am looking for scaling a PNG file according to an audio provided, a frequency range (20hz-1000hz for example) and a threshold, for a smooth effect.
For example, when there is a kick, scale go to 120% smoothly, I would like to make those audio visualizers such as dubstep, etc... where when kicks comes in, their image are "pumping".
First, is it doable with ffmpeg?
Where to start?
I found showcqt that takes frequencies in input etc., but its output is a video so I don't think I can use it in my case. Any help appreciated.
If you are able to read the PCM values as they are being output, then you might consider using a rolling RMS average in order to get a continuous stream of amplitudes. IDK the best length of the array. Perhaps it should correspond to the number of audio frames that would give you an update for each visual frame? The folks at the DSP site would have the best insights.
If you do a rolling average, computations are not terribly expensive. You'd do the square on the incoming and add that to a ring buffer (circular queue) and drop the outgoing. Only those data points need be added to the rolling average when computing the new rolling average, since the denominator is fixed and known. I found a video that describes the basic RMS math here using Matlab.
It might be necessary to add some smoothing to visualizer that is receiving the volume updates. Also, handing off data from the audio thread should likely employ some form of loose coupling. It would not be good if the thread that is processing the audio was also handling graphics.
I'm a little over my head, but I think this is what is generally done for visualizers.

I need to analyse many audio WAV files for characteristic noise, ideas?

I need to be able to analyze (search thru) hundreds of WAV files and detect but not remove static noise. As done currently now, I must listen to each conversation and find the characteristic noise/static manually, which takes too much time. Ideally, I would need a program that can read each new WAV file and be able to detect characteristic signatures of the static noise such as periods of bursts of white noise or full audio band, high amplitude noise (like AM radio noise over phone conversation such as a wall of white noise) or bursts of peek high frequency high amplitude (as in crackling on the phone line) in a background of normal voice. I do not need to remove the noise but simply detect it and flag the recording for further troubleshooting. Ideas?
I can listen to the recordings and find the static or crackling but this takes time. I need an automated or batch process that can run on its own and flag the troubled call recordings (WAV files for a phone PBX). These are SIP and analog conversations depending on the leg of the conversation so RTSP/SIP packet analysis might be an option, but the raw WAV file is the simplest. I can use Audacity, but this still requires opening each file and looking at the visual representation of the audio spectrometry and is only a little faster than listening to each call but still cumbersome.
I currently have no code or methods for this task. I simply listen to each call wav file to find the noise.
I need a batch Wav file search that can render wav file recordings that contain the characteristic noise or static or crackling over the recording phone conversation.
Unless you can tell the program how the noise looks like, it's going to be challenging to run any sort of batch processing. I was facing a similar challenge and that prompted me to develop (free and open source) software to help user in audio exploration, analysis and signal separation:
App: https://audioexplorer.online/
Docs: https://tracek.github.io/audio-explorer/
Source code: https://github.com/tracek/audio-explorer
Essentially, it visualises audio as a 2d scatter plot rather than only "linear", as in waveform or spectrogram. When you upload audio the following happens:
Onsets are detected (based on high-frequency content algorithm from aubio) according to the threshold you set. Set it to None if you want all.
Per each audio fragment, calculate audio features based on your selection. There's no universal best set of features, all depends on the application. You might try for starter with e.g. Pitch statistics. Consider setting proper values for bandpass filter and sample length (that's the length of audio fragment we're going to use). Sample length could be in future established dynamically. Check docs for more info.
The result is that for each fragment you have many features, e.g. 6 or 60. That means we have then k-dimensional (where k is number of features) structure, which we then project to 2d space with dimensionality reduction algorithm of your selection. Uniform Manifold Approximation and Projection is a sound choice.
In theory, the resulting embedding should be such that similar sounds (according to features we have selected) are closely together, while different further apart. Your noise should be now separated from your "not noise" and form cluster.
When you hover over the graph, in right-upper corner a set of icons appears. One is lasso selection. Use it to mark points, inspect spectrogram and e.g. download table with features that describe that signal. At that moment you can also reduce the noise (extra button appears) in a similar way to Audacity - it analyses the spectrum and reduces these frequencies with some smoothing.
It does not completely solve your problem right now, but could severely cut the effort. Going through hundreds of wavs could take better part of the day, but you will be done. Want it automated? There's CLI (command-line interface) that I am developing at the same time. In not-too-distant future it should take what you have labelled as noise and signal and then use supervised machine learning to go through everything in batch mode.
Suggestions / feedback? Drop an issue on GitHub.

Using microphone input to create a music visualization in real time on a 3D globe

I am involved in a side project that has a loop of LEDs around 1.5m in diameter with a rotor on the bottom which spins the loop. A raspberry pi controls the LEDs so that they create what appears to be a 3D globe of light. I am interested in a project that takes a microphone input and turns it into a column of pixels which is rendered on the loop in real time. The goal of this is to see if we can have it react to music in real-time. So far I've come up with this idea:
Using a FFT to quickly turn the input sound into a function that maps to certain pixels to certain colors based on the amplitude of the resultant function at frequencies, so the equator of the globe would respond to the strength of the lower-frequency sound, progressing upwards towards the poles which would respond to high frequency sound.
I can think of a few potential problems, including:
Performance on a raspberry pi. If the response lags too far behind the music it wouldn't seem to the observer to be responding to the specific song he/she is also hearing.
Without detecting the beat or some overall characteristic of the music that people understand it might be difficult for the observers to understand the output is correlated to the music.
The rotor has different speeds, so the image is only stationary if the rate of spin is matched perfectly to the refresh rate of the LEDs. This is a problem, but also possibly helpful because I might be able to turn down both the refresh rate and the rotor speed to reduce the computational load on the raspberry pi.
With that backstory, I should probably now ask a question. In general, how would you go about doing this? I have some experience with parallel computing and numerical methods but I am totally ignorant of music and tone and what-not. Part of my problem is I know the raspberry pi is the newest model, but I am not sure what its parallel capabilities are. I need to find a few linux friendly tools or libraries that can do an FFT on an ARM processor, and be able to do the post-processing in real time. I think a delay of ~0.25s or about would be acceptable. I feel like I'm in over my head so I thought id ask you guys for input.
Thanks!

Calculation of sound source position in 3d space

I have a 3d vector for a listener position and a 3d vector for a sound source. I also have a 3d vector for the orientation of the listener. I am trying to find the NED (north, east, down) for the position of the source relative to the listener so I can play the sounds in the right speakers... I've made so research but I can't seem to find the necessary equations...
Any idea?
Thanks!
The Ambisonics B-Format codec does what exactly what you're describing. However, although the specification of this codec is open, finding it is rather challenging due to it's unfortunate unpopularity.
The good news is, I've written a BSD open-source project called "Ambisonix" that details all the equations required to achieve up to 3rd order Ambisonics encoding and decoding. I've also added on some features such as distance encoding and Doppler effect which are not part of the original spec.
Check it out at: http://sourceforge.net/projects/ambisonix
I don't think you're going to find exactly what you're looking for. Spatial location of sound sources in a 3D field is a very complex subject and depends on many factors (listener location, loudspeaker locations, source material). The closest to what you're describing is probably Ambisonics, but this needs the listening setup to be Ambisonics too, which is not very common. If you're using something like Dolby Digital, I don't think they give out the equations, you need to license the algorithms or mix the source material with equipment which has the algorithms licensed and built in. However, systems such as Dolby are not really designed for precise sound source location in a 3D field - they're really just a spatial effect which gives the listener the feeling of a 3D sound field.
You need to subtract listener vector from sound vector then you can rotate through listener orientation vector. Now you can simple check if new vector is positive or negative on axes. For example vector [ 0, 10, -2] can be read as [0,+,-] and it means [central, up, back]. To use N-S W-E U-D directions just don't rotate vector after subtract.

Obstacle avoidance using 2 fixed cameras on a robot

I will be start working on a robotics project which involves a mobile robot that has mounted 2 cameras (1.3 MP) fixed at a distance of 0.5m in between.I also have a few ultrasonic sensors, but they have only a 10 metter range and my enviroment is rather large (as an example, take a large warehouse with many pillars, boxes, walls .etc) .My main task is to identify obstacles and also find a roughly "best" route that the robot must take in order to navigate in a "rough" enviroment (the ground floor is not smooth at all). All the image processing is not made on the robot, but on a computer with NVIDIA GT425 2Gb Ram.
My questions are :
Should I mount the cameras on a rotative suport, so that they take pictures on a wider angle?
It is posible creating a reasonable 3D reconstruction based on only 2 views at such a small distance in between? If so, to what degree I can use this for obstacle avoidance and a best route construction?
If a roughly accurate 3D representation of the enviroment can be made, how can it be used as creating a map of the enviroment? (Consider the following example: the robot must sweep an fairly large area and it would be energy efficient if it would not go through the same place (or course) twice;however when a 3D reconstruction is made from one direction, how can it tell if it has already been there if it comes from the opposite direction )
I have found this response on a similar question , but I am still concerned with the accuracy of 3D reconstruction (for example a couple of boxes situated at 100m considering the small resolution and distance between the cameras).
I am just starting gathering information for this project, so if you haved worked on something similar please give me some guidelines (and some links:D) on how should I approach this specific task.
Thanks in advance,
Tamash
If you want to do obstacle avoidance, it is probably easiest to use the ultrasonic sensors. If the robot is moving at speeds suitable for a human environment then their range of 10m gives you ample time to stop the robot. Keep in mind that no system will guarantee that you don't accidentally hit something.
(2) It is posible creating a reasonable 3D reconstruction based on only 2 views at such a small distance in between? If so, to what degree I can use this for obstacle avoidance and a best route construction?
Yes, this is possible. Have a look at ROS and their vSLAM. http://www.ros.org/wiki/vslam and http://www.ros.org/wiki/slam_gmapping would be two of many possible resources.
however when a 3D reconstruction is made from one direction, how can it tell if it has already been there if it comes from the opposite direction
Well, you are trying to find your position given a measurement and a map. That should be possible, and it wouldn't matter from which direction the map was created. However, there is the loop closure problem. Because you are creating a 3D map at the same time as you are trying to find your way around, you don't know whether you are at a new place or at a place you have seen before.
CONCLUSION
This is a difficult task!
Actually, it's more than one. First you have simple obstacle avoidance (i.e. Don't drive into things.). Then you want to do simultaneous localisation and mapping (SLAM, read Wikipedia on that) and finally you want to do path planning (i.e. sweeping the floor without covering area twice).
I hope that helps?
I'd say no if you mean each eye rotating independently. You won't get the accuracy you need to do the stereo correspondence and make calibration a nightmare. But if you want the whole "head" of the robot to pivot, then that may be doable. But you should have some good encoders on the joints.
If you use ROS, there are some tools which help you turn the two stereo images into a 3d point cloud. http://www.ros.org/wiki/stereo_image_proc. There is a tradeoff between your baseline (the distance between the cameras) and your resolution at different ranges. large baseline = greater resolution at large distances, but it also has a large minimum distance. I don't think i would expect more than a few centimeters of accuracy from a static stereo rig. and this accuracy only gets worse when you compound there robot's location uncertainty.
2.5. for mapping and obstacle avoidance the first thing i would try to do is segment out the ground plane. the ground plane goes to mapping, and everything above is an obstacle. check out PCL for some point cloud operating functions: http://pointclouds.org/
if you can't simply put a planar laser on the robot like a SICK or Hokuyo, then i might try to convert the 3d point cloud into a pseudo-laser-scan then use some off the shelf SLAM instead of trying to do visual slam. i think you'll have better results.
Other thoughts:
now that the Microsoft Kinect has been released, it is usually easier (and cheaper) to simply use that to get a 3d point cloud instead of doing actual stereo.
This project sounds a lot like the DARPA LAGR program. (learning applied to ground robots). That program is over, but you may be able to track down papers published from it.

Resources