Methods for simulating moving audio source - audio

I'm currently researching an problem regarding DOA (direction of arrival) regression for an audio source, and need to generate training data in the form of audio signals of moving sound sources. In particular, I have the stationary sound files, and I need to simulate a source and microphone(s) with the distances between them changing to reflect movement.
Is there any software online that could potentially do the trick? I've looked into pyroomacoustics and VA as well as other potential libraries, but none of them seem to deal with moving audio sources, due to the difficulties in simulating the doppler effect.
If I were to write up my own simulation code for dealing with this, how difficult would it be? My use case would be an audio source and a microphone in some 2D landscape, both moving with their own velocities, where I would want to collect the recording from the microphone as an audio file.

Some speculation here on my part, as I have only dabbled with writing some aspects of what you are asking about and am not experienced with any particular libraries. Likelihood is good that something exists and will turn up.
That said, I wonder if it would be possible to use either the Unreal or Unity game engine. Both, as far as I can remember, grant the ability to load your own cues and support 3D including Doppler.
As far as writing your own, a lot depends on what you already know. With a single-point mike (as opposed to stereo) the pitch shifting involved is not that hard. There is a technique that involves stepping through the audio file's DSP data using linear interpolation for steps that lie in between the data points, which is considered to have sufficient fidelity for most purposes. Lot's of trig, too, to track the changes in velocity.
If we are dealing with stereo, though, it does get more complicated, depending on how far you want to go with it. The head masks high frequencies, so real time filtering would be needed. Also it would be good to implement delay to match the different arrival times at each ear. And if you start talking about pinnas, I'm way out of my league.

As of now it seems like Pyroomacoustics does not support moving sound sources. However, do check a possible workaround suggested by the developers here in Issue #105 - where the idea of using a time-varying convolution on a dense microphone array is suggested.

Related

APCS final project: Converting an audio file to a simpler MIDI file

Lets say I have the audio file for Happy Birthday. I want to convert that audio file into an audio file that sounds like this : happy birthday.
First, I'd like to know if I have the ability to program this? Can a highschooler who's almost finished with APCS program this?
If I can:
How would I change the bpm of the song? I've searched through a bunch of websites, but they weren't very helpful.
I know that audio files can be represented in waveforms. How would I scan for each individual wave in an audio file (I need this to isolate the notes)?
This is a very ambitious project, actually. One reason is that it involves using digital signal processing tools like FFT (Fast fourier transforms) to analyze the sound to pick out the pitches. You might be able to find a library that can do this, but as far as coding such a tool, that would involve a steep learning curve.
If you would like to look further into this, there is a good online resource called "The Scientists and Engineers Guide to Digital Signal Processing". I was able to work through and understand the discrete fourier transform with only high school math (lots of trig) and a bit of calculus. It was a lift, though.
Trying to analyze rhythm is also no easy task. Even with advanced tools provided in professional notation system such as Finale, people have trouble playing rhythms in time well enough for the best transcription tools. Algorithms that "quantize" the beats help but also limit the amount of detail that can be included in the playback.
My guess is that as interesting and worthwhile as this project would be, to bring it to completion before the semester ends would require putting together prebuilt pieces. A lot of programming is done that way, these days.
If you scale the project back to something like just getting your code to analyze a short sample of a single note and give its pitch, that would be both impressive and doable with a lot of work. It could be done with a DFT algorithm instead of requiring FFT, reducing the amount of info you'd have to acquire first. That way, you'd only have to work your way up to understanding and implementing the material on this link which is about calculating the DFT. Notice that there is example code in BASIC. The code examples throughout this book are a big help.

Can FFT be used to find drum solos/breaks in audio files?

Is it possible with FFT to find a drum solo, or a drum break, in an audio file? Is this something FFT is able to do and are there any resources online that could aid me with learning?
In general, a FFT is not a good choice for detecting the onset of percussion sounds:
An FFT is always calculated over a window of samples (in effect a period of time) and yields the magnitude of signal within the bin and its phase offset. You can therefore determine that there is signal at that particular bin, but not its onset time. The best time resolution available is the window period. Of course, you can make the period shorter at the expense of frequency resolution.
Percussion sounds tend to look like noise and spread across the spectrum. This would be OK if you only had percussions sounds, but is not great in real-life polyphonic content.
However, you might be able to find some inference from the different characteristics of the spectra of a drum solo vs instrumental sections of a track.
The problem of finding the time at which percussion sounds start in music is described in academic journals as onset dectection and is one of the many techniques used for feature extraction; the wider field is known as Music Information Retrieval. Your problem sounds like one of identifying sections in audio files and this might be described as partitioning
A good place to start is Sonic Visualiser which is a tool written specifically for MIR applications. Plug-ins exist for various types of feature extraction. From these you will be able to easily find the large body of academic work in this area. There is an added bonus that the existing plug-ins are all open source too.
I'd look here, there was a bit of discussion with great pointers on the Gamedev SE: https://gamedev.stackexchange.com/questions/9761/beat-detection-and-fft :-)

Frequency differences from MP3 to mic

I'm trying to compare sound clips based on microphone recording. Simply put I play an MP3 file while recording from the speakers, then attempt to match the two files. I have the algorithms in place that works, but I'm seeing a slight difference I'd like to sort out to get better accuracy.
The microphone seem to favor some frequencies (add amplitude), and be slightly off on others (peaks are wider on the mic).
I'm wondering what the cause of this difference is, and how to compensate for it.
Background:
Because of speed issues in how I'm doing comparison I select certain frequencies with certain characteristics. The problem is that a high percentage of these (depending on how many I choose) don't match between MP3 and mic.
It's called the response characteristic of the microphone. Unfortunately, you can't easily get around it without buying a different, presumably more expensive, microphone.
If you can measure the actual microphone frequency response by some method (which generally requires having some etalon acoustic system and an anechoic chamber), you can compensate for it by applying an equaliser tuned to exactly inverse characteristic, like discussed here. But in practice, as Kilian says, it's much simpler to get a more precise microphone. I'd recommend a condenser or an electrostatic one.

How to mix audio samples?

My question is not completely programming-related, but nevertheless I think SO is the right place to ask.
In my program I generate some audio data and save the track to a WAV file. Everything works fine with one sound generator. But now I want to add more generators and mix the generated audio data into one file. Unfortunately it is more complicated than it seems at first sight.
Moreover I didn't find much useful information on how to mix a set of audio samples.
So is there anyone who can give me advice?
edit:
I'm programming in C++. But it doesn't matter, since I was interested in the theory behind mixing two audio tracks. The problem I have is that I cannot just sum up the samples, because this often produces distorted sound.
I assume your problem is that for every audio source you're adding in, you're having to lower the levels.
If the app gives control to a user, just let them control the levels directly. Hotness is their responsibility, not yours. This is "summing."
If the mixing is automated, you're about to go on a journey. You'll probably need compression, if not limiting. (Limiting is an extreme version of compression.)
Note that anything you do to the audio (including compression and limiting) is a form of distortion, so you WILL have coloration of the audio. Your choice of compression and limiting algorithms will affect the sound.
Since you're not generating the audio in real time, you have the possibility of doing "brick wall" limiting. That's because you have foreknowledge of the levels. Realtime limiting is more limited because you can't know what's coming up--you have to be reactive.
Is this music, sound effects, voices, what?
Programmers here deal with this all the time.
Mixing audio samples means adding them together, that's all. Typically you do add them into a larger data type so that you can detect overflow and clamp the values before casting back into your destination buffer. If you know beforehand that you will have overflow then you can scale their amplitudes prior to addition - simply multiply by a floating point value between 0 and 1, again keeping in mind the issue of precision, perhaps converting to a larger data type first.
If you have a specific problem that is not addressed by this, feel free to update your original question.
dirty mix of two samples
mix = (a + b) - a * b * sign(a + b)
You never said what programming language and platform, however for now I'll assume Windows using C#.
http://www.codeplex.com/naudio
Great open source library that really covers off lots of the stuff you'd encounter during most audio operations.

How to reproduce C64-like sounds?

I did some of my own research and found out that SID-chips had only few hardware supported synthesizing features. Including three audio oscillators with four possible waveforms (sawtooth, triangle, pulse, noise), with ADSR envelopes and ring modulators. Accompanied with oscillator sync and ring modulators. Also read there was a way to play single PCM sound as well.
It is all so little, but still I heard lots of different sounds from my TV sets. How were they combined to produce all that variety of audio?
To give some specifics, I'd like to know how to combine those components to produce guitar, piano or drum -like audio? Another interesting things would be different buzzes and sounds specific to C64.
I used to write music on the C64 for games, demos and even services (I wrote the official QuantumLink theme, even). As for your question, the four different waveforms were typically overlaid with the sync and ring mods (less often ring, because it was unpredictable on different versions of the SID chip), and sometimes used cleanly.
For example, a typical 'snare' sound would be composed of a noise waveform with a very fast attack and sustain, and depending on whether you wanted a drumstick or brush sound, either a very fast decay and moderately short release, or a short decay and slower release.
Getting the right sound was typically trial and error, and the limitations were pretty heavy. You really never got to the point of piano or guitar sound due to the simple waveforms without overlayable harmonic waveforms, about the best you could get was things that sounded beepy, things that sounded marimba-y, and things that sounded like a snare drum.
One of the tricks used most often to extend sound was done with fast machine code playback routines that could change the played notes on voices so quickly as to give the impression of a fuller, harmonic tone. We just called it arpeggiation, although at 10 to 12 note changes a second it sounded more like a buzzy chord.
As for the sampled waveforms, they were only available as single bit and later 4 bit samples. These sounded terrible despite our best attempts, because basically the method of playback for a sample on the 64 was to play a white noise waveform and rapidly alter the volume on the SID chip to produce a rising and falling wave. Do it fast enough and it sort of sounds like the original sound, poorly tuned in on a staticky radio.
I suggest you grab hold of a C64 emulator for the PC (CCS64 is a good one) and a 64 BASIC programming guide and just play around.... the SID chip is entirely manipulatable from BASIC.
To sum up, how did we get all of those piano and guitar sounds on a C64? We didn't, really.
Take a look at some of these docs related to producing music on the C64:
http://sid.kubarth.com/articles.html
This type of music you are describing falls into the category of "chiptunes". I'd recommend checking out some modern trackers like MilkyTracker, which are used to create music in this style. There are libraries like libmodplug that allow you to play tracker in your software.
Check out some of the C64 emulators out there. I've read that some of them are 100% accurate in ther sound reproduction, true to the original.

Resources