Good audio reverb source? - audio

Is there any good C or C-like source code for an audio reverb (besides Freeverb). There are endless examples of low-pass filters that sound great, but it's terribly difficult to find source for a good-sounding reverb.
Why is that? Is it a hard enough problem that the good implementations are held onto and not published?

Are you kidding? Reverb is the easiest thing in the world to do programatically:
for (int i = 0; i < input.Length; i++)
{
output[i] += input[i];
output[i + delay] += input[i] * decay;
}
I write this kind of stuff full time now, so maybe it just seems easy. Do you mean you're looking for more general echo or spatial effects, that might include frequency-modulated delay lines and chorusing and so on?

How about this one? I know you said you didn't want freeverb, but this is episode 3 of freeverb, and to my eye it looks like it has been vastly improved.
(source: soundonsound.com)
This version is a convolution reverb that supports impulse response. For those of you who don't know what that is, engineers take microphones into a space that they want to model (i.e. a performance hall) and fire a starter pistol, measuring the echoes produced. These echoes are then used to model the reverb. This process provides a very realistic reverb, that mirrors the characteristics of the performance hall.
http://freeverb3.sourceforge.net/

realistic reverberation algorithms are a bit of the 'holy grail' of audio DSP programming...
there are two basic approaches in the pro-audio market today:
convolution reverb (using impulse responses)
delay/feedback/dampening networks
the main challenge behind impulse response convolution has been the efficiency versus quality tradeoff (incl. latency!). whereas the main challenge behind delay matrix networks has been generating vast lattices of delays with little harmonic re-inforcement.
professionals pay vast amounts of money for realistic sounding reverbs... a "good" sounding reverberator can retail for $2000+, and "really good" ones for much more.
welcome to the pro-audio industry...

You could do a lot worse than read John Dattorro's paper on the subject found here, on his homepage. Dattorro worked at Lexicon and the paper I've referenced includes extensive discussion on the design of high quality reverb.
Outside of this, the various links at musicdsp, and scant reference in the literature, the design of great reverb is shrouded in secrecy. The finest reverbs are designed either by people who have worked with the designers of the last generation of great reverbs, or by obsessives who invest extraordinary quantities of time into the subject. In either case, the designers seem to become quite tight-lipped regarding their methodologies.

Related

How can I distinguish an instrument from a sound?

I just saw a paper by Cornell reconstructing faces from sound. But I am more interested in the timbre. It might be attacked with AI, but is there an easier way? For example, is instrument a going to be on a different range than instrument b.
For the most part, instruments are going to have overlapping frequency content. IDK the specific algorithms for isolating instruments--I've heard they do exist. I would think that a big element is not just tracking all the harmonics and frequency content, but looking for correspondences in volume changes or frequency changes of the different frequencies, in order to determine which frequencies should be grouped together as a single instrument. Since instruments often play the same notes at the same time, this would be no mean feat. If you are a beginner with digital signal processing, can I recommend "The Scientists and Engineers Guide to DSP" by Steve Smith? (Free download, good book on the fundamental knowledge needed to tackle such a project.)

Can FFT be used to find drum solos/breaks in audio files?

Is it possible with FFT to find a drum solo, or a drum break, in an audio file? Is this something FFT is able to do and are there any resources online that could aid me with learning?
In general, a FFT is not a good choice for detecting the onset of percussion sounds:
An FFT is always calculated over a window of samples (in effect a period of time) and yields the magnitude of signal within the bin and its phase offset. You can therefore determine that there is signal at that particular bin, but not its onset time. The best time resolution available is the window period. Of course, you can make the period shorter at the expense of frequency resolution.
Percussion sounds tend to look like noise and spread across the spectrum. This would be OK if you only had percussions sounds, but is not great in real-life polyphonic content.
However, you might be able to find some inference from the different characteristics of the spectra of a drum solo vs instrumental sections of a track.
The problem of finding the time at which percussion sounds start in music is described in academic journals as onset dectection and is one of the many techniques used for feature extraction; the wider field is known as Music Information Retrieval. Your problem sounds like one of identifying sections in audio files and this might be described as partitioning
A good place to start is Sonic Visualiser which is a tool written specifically for MIR applications. Plug-ins exist for various types of feature extraction. From these you will be able to easily find the large body of academic work in this area. There is an added bonus that the existing plug-ins are all open source too.
I'd look here, there was a bit of discussion with great pointers on the Gamedev SE: https://gamedev.stackexchange.com/questions/9761/beat-detection-and-fft :-)

Procedural sound generation algorithms?

I'd like to be able to algorithmically create sounds (like monster growls, or distant thunder.) This isn't as widely covered on the net like more traditional procedural content (terrains, etc.) Any one have any algorithms on how to create these kinds of sounds?
This, in general, is a very hard problem. Just like drawing, each sound is its own thing, and needs its own algorithms, and, like drawing, some are more easily done by algorithm than others. There's no general algorithm for creating sound any more than there's a general algorithm for drawing all things like faces, insects, and mountains. Each is it's own project (and often quite a big one), unless you're just looking to draw circles or generate sine waves.
Most of the case studies I know of are the many attempts to generate musical instrument sounds, and generally each of these attempts is a PhD thesis.
For a time-efficient solution, sampling is the way to go.
Or, if you really need a procedural approach, you could ask the question for one specific type of sound, and people might be able to come up with an algorithm for it. For example, I'd be interested in taking a shot at a "distant thunder" algorithm, but don't want to bother if having just thunder but no monsters, etc, is not useful to you.
I would suggest checking out the many software projects and papers of Perry Cook who has done some great work in the realm of physical modelling (though his website is a bit of a nightmare to navigate). Though as tom10 says, it's a very hard area. If you have the stomach for a bit of signal processing then it's a very fascinating area to get into.

How to split male and female voices from an audio file(in c++ or java)

I want to differentiate betwen the male n female voices in an audio file and seperate them.As an output I want the two voices seperated.Can u please help me out n can the coding be done in java or c++
This is potentially a very complicated question, and it is similar to writing your own speech recognition (or identification) algorithm.
You would start by converting the audio into the frequency domain, which is done using a Fast Fourier Transform.
For each slice in time that you take an FFT, this will give you a list of frequencies and their amplitudes. You will somehow need to detect the fundamental tone by analysing the harmonics. The 2nd and 3rd harmonics will be clearest. It's very hard to figure out which harmonics they are, especially with the background noise and the natural difference between people's voices in terms of which harmonics are loudest. Then you can try to determine if the speaker is male or female by whatever you guessed the fundamental tone to be.
Keep in mind that during many parts of speech like sibilance ('s', 't', etc) there is no tone, just noise. It will need to be pretty intelligent.
Hope that sets you in the right general direction.
Note: if the two voices are simultaneous and you want to separate them cleanly, then this won't help you. I don't believe anyone alive has solved such a problem.
I think this is already possible. I just started taking an on-line course on Machine Learning by Stanford University with professor Andrew Ng, and during the first lecture he shows a demo where an audio recording of two overlapping voices is processed and the individual voices extracted (the same with music in the background and a person speaking). Apparently it uses an unsupervised learning algorithm that allows it to extract the two underlying patterns. You may want to look into that course (there's one version of the course here: http://www.academicearth.org/courses/machine-learning)
One such tool that makes this possible is LIUM spkdiarization. Written in Java and available under GPL, it is a speech recognition tool and uses statistical models for male, female and child. Luckily for you, the models are provided and you can use it without having to tag the recordings and train the models.
See the scripting page of the LIUM wiki for examples, search in page for "gender".
I would start by saying this is impossible. Speech recognition is really, really hard.
You're not clear in your question - are the voices overlapping? If so, splitting them up will be absurdly difficult.
If they are separate, your more likely bet is to have a large set of samples of male and female voices, and look for common characteristics (and a way to programmatically identify them). If the samples aren't recorded cleanly (if they have background noise), things get even more complicated.
You may get away with an average tone - male voices are generally deeper than female..
What you are asking is one hell of a task. thomasrutter wrote some "pointers" how to do it - but, i guess the algorithm would have to be really really robust if you would wish to use it everywhere (in all sorts of music (with singing of course)). Maybe it would be better/easier to start with separating (spliting) a single instrument sample from the song.

3D Audio Engine

Despite all the advances in 3D graphic engines, it strikes me as odd that the same level of attention hasn't been given to audio. Modern games do real-time rendering of 3D scenes, yet we still get more-or-less pre-canned audio accompanying those scenes.
Imagine - if you will - a 3D engine that models not just the physical appearance of items, but also their audio properties. And from these models it can dynamically generate audio based on the materials that come into contact, their velocity, distance from your virtual ears, etcetera. Now, when you're crouching behind the sandbags with bullets flying over your head, each one will yield a unique and realistic sound.
The obvious application of such a technology would be gaming, but I'm sure there are many other possibilities.
Is such a technology being actively developed? Does anyone know of any projects that attempt to achieve this?
Thanks,
Kent
I once did some research toward improving OpenAL, and the problem with simulating 3D audio is that so many of the cues that your mind uses — the slightly different attenuation at various angles, the frequency difference between sounds in front of you and those behind you — are quite specific to your own head and are not quite the same for anyone else!
If you want, say, a pair of headphones to really make it sound like a creature is in the leaves ahead and in front of the character in a game, then you actually have to take that player into a studio, measure how their own particular ears and head change the amplitude and phase of the sound at different distances (amplitude and phase are different, and are both quite important to the way your brain processes sound direction), and then teach the game to attenuate and phase-shift the sounds for that particular player.
There do exist "standard heads" that have been mocked up with plastic and used to get generic frequency-response curves for the various directions around the head, but an average or standard will never sound quite right to most players.
Thus the current technology is basically to sell the player five cheap speakers, have them place them around their desk, and then the sounds — while not particularly well reproduced — actually do sound like they're coming from behind or beside the player because, well, they are coming from the speaker behind the player. :-)
But some games do bother to be careful to compute how sound would be muffled and attenuated through walls and doors (which can get difficult to simulate, because the ear receives the same sound at a few milliseconds different delay through various materials and reflective surfaces in the environment, all of which would have to be included if things were to sound realistic). They tend to keep their libraries under wraps, however, so public reference implementations like OpenAL tend to be pretty primitive.
Edit: here is a link to an online data set that I found at the time, that could be used as a starting point for creating a more realistic OpenAL sound field, from MIT:
http://sound.media.mit.edu/resources/KEMAR.html
Enjoy! :-)
Aureal did this back in 1998. I still have one of their cards, although I'd need Windows 98 to run it.
Imagine ray-tracing, but with audio. A game using the Aureal API would provide geometric environment information (e.g. a 3D map) and the audio card would ray-trace sound. It was exactly like hearing real things in the world around you. You could focus your eyes on the sound sources and attend to given sources in a noisy environment.
As I understand it, Creative destroyed Aureal by means of legal expenses in a series of patent infringement claims (which were all rejected).
In the public domain world, OpenAL exists - an audio version of OpenGL. I think development stopped a long time ago. They had a very simple 3D audio approach, no geometry - no better than EAX in software.
EAX 4.0 (and I think there is a later version?) finally - after a decade - I think have incoporated some of the geometric information ray-tracing approach Aureal used (Creative bought up their IP after they folded).
The Source (Half-Life 2) engine on the SoundBlaster X-Fi already does this.
It really is something to hear. You can definitely hear the difference between an echo against concrete vs wood vs glass, etc...
A little known side area is voip. While games are having actively developed software, you are likely to spent time talking to others while you are gaming as well.
Mumble ( http://mumble.sourceforge.net/ ) is software that uses plugins to determine who is ingame with you. It will then position its audio in a 360 degree area around you, so the left is to the left, behind you sounds like as such. This made a creepily realistic addition, and while trying it out it led to funny games of "marko, polo".
Audio took a massive back turn in vista, where hardware was not allowed to be used to accelerate it anymore. This killed EAX as it was in the XP days. Software wrappers are gradually getting built now.
Very interesting field indeed. So interesting, that I'm going to do my master's degree thesis on this subject. In particular, it's use in first person shooters.
My literature research so far has made it clear that this particular field has little theoretical background. Not a lot of research has been done in this field, and most theory is based on movie-audio theory.
As for practical applications, I haven't found any so far. Of course, there are plenty titles and packages which support real-time audio-effect processing and apply them depending on the general surroundings of the auditor. e.g.: auditor enters a hall, so a echo/reverb effect is applied on the sound samples. This is rather crude. An analogy for visuals would be to subtract 20% of the RGB-value of the entire image when someone turns off (or shoots ;) ) one of five lightbulbs in the room. It's a start, but not very realisic at all.
The best work I found was a (2007) PhD thesis by Mark Nicholas Grimshaw, University of Waikato , called The Accoustic Ecology of the First-Person Shooter
This huge pager proposes a theoretical setup for such an engine, as well as formulating a wealth of taxonomies and terms for analysing game-audio. Also he argues that the importance of audio for first person shooters is greatly overlooked, as audio is a powerful force for emergence into the game world.
Just think about it. Imagine playing a game on a monitor with no sound but picture perfect graphics. Next, imagine hearing game realisic (game) sounds all around you, while closing your eyes. The latter will give you a much greater sense of 'being there'.
So why haven't game developers dove into this full-hearted already? I think the answer to that is clear: it's much harder to sell. Improved images is easy to sell: you just give a picture or movie and it's easy to see how much prettier it is. It's even easily quantifyable (e.g. more pixels=better picture). For sound it's not so easy. Realism in sound is much more sub-conscious, and therefor harder to market.
The effects the real world has on sounds are subconsciously percieved. Most people never even notice most of them. Some of these effects cannot even conciously be heard. Still, they all play a part in the percieved realism of the sound. There is an easy experiment you can do yourself which illustrates this. Next time you're walking on the sidewalk, listen carefully to the background sounds of the enviroment: wind blowing through leaves, all the cars on distant roads, etc.. Then, listen to how this sound changes when you walk nearer or further from a wall, or when you walk under an overhanging balcony, or when you pass an open door even. Do it, listen carefully, and you'll notice a big difference in sound. Probably much bigger than you ever remembered.
In a game world, these type of changes aren't reflected. And even though you don't (yet) consciously miss them, your subconsciously do, and this will have a negative effect on your level of emergence.
So, how good does audio have to be in comparison to the image? More practical: which physical effects in the real world contribute the most to the percieved realism. Does this percieved realism depend on the sound and/or the situation? These are the questions I wish to answer with my research. After that, my idea is to design a practical framework for an audio engine which could variably apply some effects to some or all game audio, depending (dynamically) on the amount of available computing power. Yup, I'm setting the bar pretty high :)
I'll be starting per September 2009. If anyone's interested, I'm thinking about setting up a blog to share my progress and findings.
Janne Louw
(BSc Computer Sciences Universiteit Leiden, The Netherlands)

Resources