Quickest and easiest algorithm for comparing the frequency content of two sounds - audio

I want to take two sounds that contain a dominant frequency and say 'this one is higher than this one'. I could do FFT, find the frequency with the greatest amplitude of each and compare them. I'm wondering if, as I have a specific task, there may be a simpler algorithm.
The sounds are quite dirty with many frequencies, but contain a clear dominant pitch. They aren't perfectly produced sine waves.

Given that the sounds are quite dirty, I would suggest starting to develop the algorithm with the output of an FFT as it'll be much simpler to diagnose any problems. Then when you're happy that it's working you can think about optimising/simplifying.
As a rule of thumb when developing this kind of numeric algorithm, I always try to operate first in the most relevant domain (in this case you're interested in frequencies, so analyse in frequency space) at the start, and once everything is behaving itself consider shortcuts/optimisations. That way you can test the latter solution against the best-performing former.

In the general case, decent pitch detection/estimation generally requires a more sophisticated algorithm than looking at FFT peaks, not a simpler algorithm.

There are a variety of pitch detection methods ranging in sophistication from counting zero-crossing (which obviously won't work in your case) to extremely complex algorithms.
While the frequency domain methods seems most appropriate, it's not as simple as "taking the FFT". If your data is very noisy, you may have spurious peaks that are higher than what you would consider to be the dominant frequency. One solution is use window overlapping segments of your signal, and do STFTs, and average the results. But this raises more questions: how big should the windows be? In this case, it depends on how far apart you expect those dominant peaks to be, how long your recordings are, etc. (Note: FFT methods can resolve to better than one-bin size by taking into account phase information. In this case, you would have to do something more complex than averaging all your FFT windows together).
Another approach would be a time-domain method, such as YIN:
http://recherche.ircam.fr/equipes/pcm/cheveign/pss/2002_JASA_YIN.pdf
Wikipedia discusses some more methods:
http://en.wikipedia.org/wiki/Pitch_detection_algorithm
You can also explore some more methods in chapter 9 of this book:
http://www.amazon.com/DAFX-Digital-Udo-ouml-lzer/dp/0471490784
You can get matlab sourcecode for yin from chapter 9 of that book here:
http://www2.hsu-hh.de/ant/dafx2002/DAFX_Book_Page_2nd_edition/matlab.html

Related

How does the Ableton Drum-To-MIDI function work?

I can't seem to find any information regarding the process that Ableton uses to efficiently detect atonal percussion and convert it into MIDI. I assume feature extraction and onset detection algorithms are executed, but I'm intrigued as to what algorithms. I am particularly interesting how its efficiency is maintained for a beatboxed input.
Cheers
Your guesses are as good as everyone else's - although they look plausible. The reality is that the way this feature is implemented in Ableton is a trade secret and likely to remain that way.
If I'm not mistaken Ableton licenses technology from https://www.zplane.de/ for these things.
I don't exactly know how the software assigns the different drum sounds, but the chapter in the live manual Convert Drums to New MIDI Track says that it can only detect kick, snare and hi-hat. An important thing is that they are identified by the transient Markers. For a good result you should manually check and adjust them. The transient Markers look like the warp Markers, but are grey.
compared to a kick and a snare for example, a beatboxed input is likely to have less difference between the individual sounds and therefore likely to be harder for Ableton to individually extract the seperate sounds (depends on the beatboxer). In any case, some combination of frequency and amplitude - more specifically(Attack, Decay, Sustain, Release) as well as perhaps the different overtone combinations that account for differences in timbre are going to be the characteristics that would have to be evaluated in order to separate the kick snare and hihat .
Before this feature existed I used gates and hi/low pass filters to accomplish a similar task. So perhaps Ableton's solution is not as complicated as we might imagine.

Dynamic range compression at audio volume normalization [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I already asked about audio volume normalization. On most methods (e.g. ReplayGain, which I am most interested in), I might get peaks that exceed the PCM limit (as can also be read here).
Simple clipping would probably be the worst thing I can do. As Wikipedia suggests, I should do some form of dynamic range compression.
I am speaking about the function which I'm applying on each individual PCM sample value. On another similar question, one answer suggests that doing this is not enough or not the thing I should do. However, I don't really understand that as I still have to handle the clipping case. Does the answer suggest to do the range compression on multiple samples at once and do to simple hard clipping in addition on every sample?
Leaving that aside, the functions discussed in the Wikipedia article seem to be somewhat not what I want (in many cases, I would still have the clipping case in the end). I am thinking about using something like tanh. Is that a bad idea? It would reduce the volume slightly but guarantee that I don't get any clipping.
My application is a generic music player. I am searching for a solution which mostly works best for everyone so that I can always turn it on and the user very likely does not want to turn this off.
Using any instantaneous dynamic range processing (such as clipping or tanh non-linearity) will introduce audible distortion. Put a sine wave into an instantaneous non-linear function and you no longer have a sine wave. While useful for certain audio applications, it sounds like you do not want these artefacts.
Normalization does not effect the dynamics (in terms of min/max ratio) of a waveform. Normalization involves element-wise multiplication of a waveform by a constant scalar value to ensure no samples ever exceed a maximum value. This process can only by done off-line, as you need to analyse the entire signal before processing. Normalization is also a bad idea if your waveform contains any intense transients. Your entire signal will be attenuated by the ratio of the transient peak value divided by the clipping threshold.
If you just want to protect the output from clipping you are best off using a side chain type compressor. A specific form of this is the limiter (infinite compression ratio above a threshold with zero attack time). A side-chain compressor calculates the smoothed energy envelope of a signal and then applies a varying gain according to that function. They are not instantaneous, so you reduce audible distortion that you'd get from the functions you mention. A limiter can have instantaneous attack to prevent from clipping, but you allow a release time so that the limiter remains attenuating for subsequent waveform peaks, the subsequent waveform is just turned down and so there is no distortion. After the intense sound, the limiter recovers.
You can get a pumping type sound from this type of processing if there are a lot of high intensity peaks in the waveform. If this becomes problematic, you can then move to the next level and do the dynamics processing within sub-bands. This way, only the offending parts of the frequency spectrum will be attenuated, leaving the rest of the sound unaffected.
The general solution is to normalize to some gain level significantly below 1 such that very few songs require adding gain. In other words, most of the time you will be lowering the volume of signal rather than increasing. Experiment with a wide variety of songs in different styles to figure out what this level is.
Now, occasionally, you'll still come across a song that requires enough gain that, that, at some point, it would clip. You have two options: 1. don't add that much gain. This one song will sound a bit quieter. C'est la vie. (this is a common approach), or 2. apply a small amount of dynamic range compression and/or limiting. Of course, you can also do some combination 1 and 2. I believe iTunes uses a combination of 1 and 2, but they've worked very hard on #2, and they apply very little.
Your suggestion, using a function like tanh, on a sample-by-sample basis, will result in audible distortion. You don't want to do this for a generic music player. This is the sort of thing that's done in guitar amp simulators to make them sound "dirty" and "grungy". It might not be audible in rock, pop, or other modern music which is heavy on distortion already, but on carefully recorded choral, jazz or solo violin music people will be upset. This has nothing to do with the choice of tanh, by the way, any nonlinear function will produce distortion.
Dynamic range compression uses envelopes that are applied over time to the signal: http://en.wikipedia.org/wiki/Dynamic_range_compression
This is tricky to get right, and you can never create a compressor that is truly "transparent". A limiter can be thought of as an extreme version of a compressor that (at least in theory) prevents signal from going above a certain level. A digital "lookahead" limiter can do so without noticeable clipping. When judiciously used, it is pretty transparent.
If you take this approach, make sure that this feature can be turned off, because no matter how transparent you think it is, someone will hear it and not like it.

How to amplify certain audio samples, particularly amplifying a certain frequency?

Can anyone provide sample pseudocode or share some existing link that has sample code.
Like for example I have a mix audio of 1kHz or 2kHz or 8kHz or so, and I want to boost certain frequencies like 1kHz only in real-time.
Reading some DSP books and resources confuses me.
You just need to design and implement a suitable digital filter. This is a large and complex subject area though, so you won't get a simple answer here. Probably the best thing as a first step would be to read a good introductory book on DSP, e.g. Understanding DSP by Rick Lyons, which is a very good for beginners as it's not too heavy on the math and has a more practical bent than most such introductory DSP books.
For this particular application though what you are trying to do is similar to implementing a graphic equalizer, and there are many pointers to how to implement this kind of thing if you use e.g. "graphic equalizer" as a search term.
There's a lot of math behind digital filtering. Sorry, I think it is important to at least understand basic filters (like those used in electronics). If you don't want to go through the basics: best to get an audio graphics equaliser where you can play with the (virtual) sliders. If you want to implement a very specific filter, please read on.
Real time: depends on your computing platform. If this is a small micro (like AVR, Microchip PIC,..) you'll need an efficient algorithm. This is likely a IIR band pass filter. The equivalent of a graphics equaliser consists of multiple band pass filters, all summed together. See http://en.wikipedia.org/wiki/Infinite_impulse_response
A more computing intensive algorithm uses FIR filters. In that case you can also control the phase of the filtered signal. http://en.wikipedia.org/wiki/Finite_impulse_response
If you find an algorithm (i.e. IIR), you'll need to calculate the coefficients. The algorithm is simple, calculating the coefficients is not.
I found a book matching your question: Audio digital signal processing in real time
I browsed through it; it seems to have the right answers.

Creating a lava lamp-like animation

I recently saw something that set me wondering how to create a realistic-looking (2D) lava lamp-like animation, for a screen-saver or game.
It would of course be possible to model the lava lamp's physics using partial differential equations, and to translate that into code. However, this is likely to be both quite difficult (because of several factors, not least of which is the inherent irregularity of the geometry of the "blobs" of wax and the high number of variables) and probably computationally far too expensive to calculate in real time.
Analytical solutions, if any could be found, would be similarly useless because you would want to have some degree of randomness (or stochasticity) in the animation.
So, the question is, can anyone think of an approach that would allow you to animate a realistic looking lava lamp, in real time (at say 10-30 FPS), on a typical desktop/laptop computer, without modelling the physics in any great detail? In other words, is there a way to "cheat"?
One way to cheat might be to use a probabilistic cellular automaton with a well-chosen transition table to simulate the motion of the blobs. Some popular screensavers (in particular ParticleFire) use this approach to elegantly simulate complex motions in 2D space by breaking the objects down to individual pixels and then defining the ways in which individual pixels transition by looking at the states of their neighbors. You can get some pretty emergent behavior with simple cellular automata - look at Conway's game of life, for example, or this simulation of a forest fire.
LavaLite is open source. You can get code with the xscreensaver-gl package in most Linux distros. It uses metaballs.

How does pathfinding in RTS video games work?

In a game such as Warcraft 3 or Age of Empires, the ways that an AI opponent can move about the map seem almost limitless. The maps are huge and the position of other players is constantly changing.
How does the AI path-finding in games like these work? Standard graph-search methods (such as DFS, BFS or A*) seem impossible in such a setup.
Take the following with a grain of salt, since I don't have first-person experience with pathfinding.
That being said, there are likely to be different approaches, but I think standard graph-search methods, notably (variants of) A* are perfectly reasonable for strategy games. Most strategy games I know seem to be based on a tile system, where the map is comprised of little squares, which are easily mapped to a graph. One example would be StarCraft II (Screenshot), which I'll keep using as an example in the remainder of this answer, because I'm most familiar with it.
While A* can be used for real-time strategy games, there are a few drawbacks that have to be overcome by tweaks to the core algorithm:
A* is too slow
Since an RTS is by definion "real time", waiting for the computation to finish will frustrate the player, because the units will lag. This can be remedied in several ways. One is to use Multi-tiered A*, which computes a rough course before taking smaller obstacles into account. Another obvious optimization is to group units heading to the same destination into a platoon and only calculate one path for all of them.
Instead of the naive approach of making every single tile a node in the graph, one could also build a navigation mesh, which has fewer nodes and could be searched faster – this requires tweaking the search algorithm a little, but it would still be A* at the core.
A* is static
A* works on a static graph, so what to do when the landscape changes? I don't know how this is done in actual games, but I imagine the pathing is done repeatedly to cope with new obstacles or removed obstacles. Maybe they are using an incremental version of A* (PDF).
To see a demonstration of StarCraft II coping with this, go to 7:50 in this video.
A* has perfect information
A part of many RTS games is unexplored terrain. Since you can't see the terrain, your units shouldn't know where to walk either, but often they do anyway. One approach is to penalize walking through unexplored terrain, so units are more reluctant to take advantage of their omniscience, another is to take the omniscience away and just assume unexplored terrain is walkable. This can result in the units stumbling into dead ends, sometimes ones that are obvious to the player, until they finally explore a path to the target.
Fog of War is another aspect of this. For example, in StarCraft 2 there are destructible obstacles on the map. It has been shown that you can order a unit to move to the enemy base, and it will start down a different path if the obstacle has already been destroyed by your opponent, thus giving you information you should not actually have.
To summarize: You can use standard algorithms, but you may have to use them cleverly. And as a last bonus: I have found Amit’s Game Programming Information interesting with regard to pathing. It also has links to further discussion of the problem.
This is a bit of a simple example, but it shows that you can make the illusion of AI / Indepth Pathfinding from a non-complex set of rules: Pac-Man Pathfinding
Essentially, it is possible for the AI to know local (near by) information and make decisions based on that knowledge.
A* is a common pathfinding algorithm. This is a popular game development topic - you should be able to find numerous books and websites that contain information.
Check out visibility graphs. I believe that is what they use for path finding.

Resources