Integer FM Demodulation - audio

What are some software (or FPGA) techniques suitable for FM demodulation? I've been experimenting in MATLAB to try and get an algorthm right, but I've been basing it on a analog reference material with limited results. I can make out the audio, but there is horrible distortions that I can't fix with filtering. Ultimately I want to be able to use an integer implementation on FPGA, but I need to get the basic demodulation working first.
An FFT shows the spectrum has been moved back down to be centered around DC, but it just doesn't sound right.

If you're already hearing the sound, then I'd say you're most of the way there. It might help if you explain (or paste) some of the code/algorithm that you're using, as well as describing the noise as best you can.
If the noise is only appearing in integer based calculations, then integer rounding errors or overflow are the most likely causes of the noise - though perhaps shifting from the frequency domain is causing that noise to sound a little foreign. The key to good integer based calculations is to know your operator precedence and to make sure you're staying within the bounds of your integer at each step of the calculation. Too big and you'll get overflow, too small and you'll lose resolution.
Pre/de-emphasis may also cause your output to sound strange if you're not accounting for it although I wouldn't really expect "heavy distortion" to result.

Related

Methods for simulating moving audio source

I'm currently researching an problem regarding DOA (direction of arrival) regression for an audio source, and need to generate training data in the form of audio signals of moving sound sources. In particular, I have the stationary sound files, and I need to simulate a source and microphone(s) with the distances between them changing to reflect movement.
Is there any software online that could potentially do the trick? I've looked into pyroomacoustics and VA as well as other potential libraries, but none of them seem to deal with moving audio sources, due to the difficulties in simulating the doppler effect.
If I were to write up my own simulation code for dealing with this, how difficult would it be? My use case would be an audio source and a microphone in some 2D landscape, both moving with their own velocities, where I would want to collect the recording from the microphone as an audio file.
Some speculation here on my part, as I have only dabbled with writing some aspects of what you are asking about and am not experienced with any particular libraries. Likelihood is good that something exists and will turn up.
That said, I wonder if it would be possible to use either the Unreal or Unity game engine. Both, as far as I can remember, grant the ability to load your own cues and support 3D including Doppler.
As far as writing your own, a lot depends on what you already know. With a single-point mike (as opposed to stereo) the pitch shifting involved is not that hard. There is a technique that involves stepping through the audio file's DSP data using linear interpolation for steps that lie in between the data points, which is considered to have sufficient fidelity for most purposes. Lot's of trig, too, to track the changes in velocity.
If we are dealing with stereo, though, it does get more complicated, depending on how far you want to go with it. The head masks high frequencies, so real time filtering would be needed. Also it would be good to implement delay to match the different arrival times at each ear. And if you start talking about pinnas, I'm way out of my league.
As of now it seems like Pyroomacoustics does not support moving sound sources. However, do check a possible workaround suggested by the developers here in Issue #105 - where the idea of using a time-varying convolution on a dense microphone array is suggested.

How to emulate sound physics in a 3d game?

What sort of algorithms do I use for simulating sound? Like, if the player approaches the source of a sound, it should get louder, but if the player goes farther away, it should get softer. That's the big thing I can't seem to figure out.
I don't require any code, mostly I just want the equations, assuming there is one.
What you are talking about is important just like the Doppler effect. In general, you need more than just calculate the distance of an object to the source of the sound upon location change. It is much better to take into account the following:
the movement of the sound source
the movement of the active object
potential obstacles (for instance a wall)
"approaching" and "departing" as special cases of the Doppler effect
distance deviation in short time period
It should not be a goal to make this perfectly accurate, because in that case you would have to calculate too many things. Your aim should be to make this "good-enough" and the definition of "good-enough" should be made by you upon tests. Naturally, you need a lot of formulas.

Dynamic range compression at audio volume normalization [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I already asked about audio volume normalization. On most methods (e.g. ReplayGain, which I am most interested in), I might get peaks that exceed the PCM limit (as can also be read here).
Simple clipping would probably be the worst thing I can do. As Wikipedia suggests, I should do some form of dynamic range compression.
I am speaking about the function which I'm applying on each individual PCM sample value. On another similar question, one answer suggests that doing this is not enough or not the thing I should do. However, I don't really understand that as I still have to handle the clipping case. Does the answer suggest to do the range compression on multiple samples at once and do to simple hard clipping in addition on every sample?
Leaving that aside, the functions discussed in the Wikipedia article seem to be somewhat not what I want (in many cases, I would still have the clipping case in the end). I am thinking about using something like tanh. Is that a bad idea? It would reduce the volume slightly but guarantee that I don't get any clipping.
My application is a generic music player. I am searching for a solution which mostly works best for everyone so that I can always turn it on and the user very likely does not want to turn this off.
Using any instantaneous dynamic range processing (such as clipping or tanh non-linearity) will introduce audible distortion. Put a sine wave into an instantaneous non-linear function and you no longer have a sine wave. While useful for certain audio applications, it sounds like you do not want these artefacts.
Normalization does not effect the dynamics (in terms of min/max ratio) of a waveform. Normalization involves element-wise multiplication of a waveform by a constant scalar value to ensure no samples ever exceed a maximum value. This process can only by done off-line, as you need to analyse the entire signal before processing. Normalization is also a bad idea if your waveform contains any intense transients. Your entire signal will be attenuated by the ratio of the transient peak value divided by the clipping threshold.
If you just want to protect the output from clipping you are best off using a side chain type compressor. A specific form of this is the limiter (infinite compression ratio above a threshold with zero attack time). A side-chain compressor calculates the smoothed energy envelope of a signal and then applies a varying gain according to that function. They are not instantaneous, so you reduce audible distortion that you'd get from the functions you mention. A limiter can have instantaneous attack to prevent from clipping, but you allow a release time so that the limiter remains attenuating for subsequent waveform peaks, the subsequent waveform is just turned down and so there is no distortion. After the intense sound, the limiter recovers.
You can get a pumping type sound from this type of processing if there are a lot of high intensity peaks in the waveform. If this becomes problematic, you can then move to the next level and do the dynamics processing within sub-bands. This way, only the offending parts of the frequency spectrum will be attenuated, leaving the rest of the sound unaffected.
The general solution is to normalize to some gain level significantly below 1 such that very few songs require adding gain. In other words, most of the time you will be lowering the volume of signal rather than increasing. Experiment with a wide variety of songs in different styles to figure out what this level is.
Now, occasionally, you'll still come across a song that requires enough gain that, that, at some point, it would clip. You have two options: 1. don't add that much gain. This one song will sound a bit quieter. C'est la vie. (this is a common approach), or 2. apply a small amount of dynamic range compression and/or limiting. Of course, you can also do some combination 1 and 2. I believe iTunes uses a combination of 1 and 2, but they've worked very hard on #2, and they apply very little.
Your suggestion, using a function like tanh, on a sample-by-sample basis, will result in audible distortion. You don't want to do this for a generic music player. This is the sort of thing that's done in guitar amp simulators to make them sound "dirty" and "grungy". It might not be audible in rock, pop, or other modern music which is heavy on distortion already, but on carefully recorded choral, jazz or solo violin music people will be upset. This has nothing to do with the choice of tanh, by the way, any nonlinear function will produce distortion.
Dynamic range compression uses envelopes that are applied over time to the signal: http://en.wikipedia.org/wiki/Dynamic_range_compression
This is tricky to get right, and you can never create a compressor that is truly "transparent". A limiter can be thought of as an extreme version of a compressor that (at least in theory) prevents signal from going above a certain level. A digital "lookahead" limiter can do so without noticeable clipping. When judiciously used, it is pretty transparent.
If you take this approach, make sure that this feature can be turned off, because no matter how transparent you think it is, someone will hear it and not like it.

Quickest and easiest algorithm for comparing the frequency content of two sounds

I want to take two sounds that contain a dominant frequency and say 'this one is higher than this one'. I could do FFT, find the frequency with the greatest amplitude of each and compare them. I'm wondering if, as I have a specific task, there may be a simpler algorithm.
The sounds are quite dirty with many frequencies, but contain a clear dominant pitch. They aren't perfectly produced sine waves.
Given that the sounds are quite dirty, I would suggest starting to develop the algorithm with the output of an FFT as it'll be much simpler to diagnose any problems. Then when you're happy that it's working you can think about optimising/simplifying.
As a rule of thumb when developing this kind of numeric algorithm, I always try to operate first in the most relevant domain (in this case you're interested in frequencies, so analyse in frequency space) at the start, and once everything is behaving itself consider shortcuts/optimisations. That way you can test the latter solution against the best-performing former.
In the general case, decent pitch detection/estimation generally requires a more sophisticated algorithm than looking at FFT peaks, not a simpler algorithm.
There are a variety of pitch detection methods ranging in sophistication from counting zero-crossing (which obviously won't work in your case) to extremely complex algorithms.
While the frequency domain methods seems most appropriate, it's not as simple as "taking the FFT". If your data is very noisy, you may have spurious peaks that are higher than what you would consider to be the dominant frequency. One solution is use window overlapping segments of your signal, and do STFTs, and average the results. But this raises more questions: how big should the windows be? In this case, it depends on how far apart you expect those dominant peaks to be, how long your recordings are, etc. (Note: FFT methods can resolve to better than one-bin size by taking into account phase information. In this case, you would have to do something more complex than averaging all your FFT windows together).
Another approach would be a time-domain method, such as YIN:
http://recherche.ircam.fr/equipes/pcm/cheveign/pss/2002_JASA_YIN.pdf
Wikipedia discusses some more methods:
http://en.wikipedia.org/wiki/Pitch_detection_algorithm
You can also explore some more methods in chapter 9 of this book:
http://www.amazon.com/DAFX-Digital-Udo-ouml-lzer/dp/0471490784
You can get matlab sourcecode for yin from chapter 9 of that book here:
http://www2.hsu-hh.de/ant/dafx2002/DAFX_Book_Page_2nd_edition/matlab.html

How to mix audio samples?

My question is not completely programming-related, but nevertheless I think SO is the right place to ask.
In my program I generate some audio data and save the track to a WAV file. Everything works fine with one sound generator. But now I want to add more generators and mix the generated audio data into one file. Unfortunately it is more complicated than it seems at first sight.
Moreover I didn't find much useful information on how to mix a set of audio samples.
So is there anyone who can give me advice?
edit:
I'm programming in C++. But it doesn't matter, since I was interested in the theory behind mixing two audio tracks. The problem I have is that I cannot just sum up the samples, because this often produces distorted sound.
I assume your problem is that for every audio source you're adding in, you're having to lower the levels.
If the app gives control to a user, just let them control the levels directly. Hotness is their responsibility, not yours. This is "summing."
If the mixing is automated, you're about to go on a journey. You'll probably need compression, if not limiting. (Limiting is an extreme version of compression.)
Note that anything you do to the audio (including compression and limiting) is a form of distortion, so you WILL have coloration of the audio. Your choice of compression and limiting algorithms will affect the sound.
Since you're not generating the audio in real time, you have the possibility of doing "brick wall" limiting. That's because you have foreknowledge of the levels. Realtime limiting is more limited because you can't know what's coming up--you have to be reactive.
Is this music, sound effects, voices, what?
Programmers here deal with this all the time.
Mixing audio samples means adding them together, that's all. Typically you do add them into a larger data type so that you can detect overflow and clamp the values before casting back into your destination buffer. If you know beforehand that you will have overflow then you can scale their amplitudes prior to addition - simply multiply by a floating point value between 0 and 1, again keeping in mind the issue of precision, perhaps converting to a larger data type first.
If you have a specific problem that is not addressed by this, feel free to update your original question.
dirty mix of two samples
mix = (a + b) - a * b * sign(a + b)
You never said what programming language and platform, however for now I'll assume Windows using C#.
http://www.codeplex.com/naudio
Great open source library that really covers off lots of the stuff you'd encounter during most audio operations.

Resources