Procedural sound generation algorithms? - audio

I'd like to be able to algorithmically create sounds (like monster growls, or distant thunder.) This isn't as widely covered on the net like more traditional procedural content (terrains, etc.) Any one have any algorithms on how to create these kinds of sounds?

This, in general, is a very hard problem. Just like drawing, each sound is its own thing, and needs its own algorithms, and, like drawing, some are more easily done by algorithm than others. There's no general algorithm for creating sound any more than there's a general algorithm for drawing all things like faces, insects, and mountains. Each is it's own project (and often quite a big one), unless you're just looking to draw circles or generate sine waves.
Most of the case studies I know of are the many attempts to generate musical instrument sounds, and generally each of these attempts is a PhD thesis.
For a time-efficient solution, sampling is the way to go.
Or, if you really need a procedural approach, you could ask the question for one specific type of sound, and people might be able to come up with an algorithm for it. For example, I'd be interested in taking a shot at a "distant thunder" algorithm, but don't want to bother if having just thunder but no monsters, etc, is not useful to you.

I would suggest checking out the many software projects and papers of Perry Cook who has done some great work in the realm of physical modelling (though his website is a bit of a nightmare to navigate). Though as tom10 says, it's a very hard area. If you have the stomach for a bit of signal processing then it's a very fascinating area to get into.

Related

How does the Ableton Drum-To-MIDI function work?

I can't seem to find any information regarding the process that Ableton uses to efficiently detect atonal percussion and convert it into MIDI. I assume feature extraction and onset detection algorithms are executed, but I'm intrigued as to what algorithms. I am particularly interesting how its efficiency is maintained for a beatboxed input.
Cheers
Your guesses are as good as everyone else's - although they look plausible. The reality is that the way this feature is implemented in Ableton is a trade secret and likely to remain that way.
If I'm not mistaken Ableton licenses technology from https://www.zplane.de/ for these things.
I don't exactly know how the software assigns the different drum sounds, but the chapter in the live manual Convert Drums to New MIDI Track says that it can only detect kick, snare and hi-hat. An important thing is that they are identified by the transient Markers. For a good result you should manually check and adjust them. The transient Markers look like the warp Markers, but are grey.
compared to a kick and a snare for example, a beatboxed input is likely to have less difference between the individual sounds and therefore likely to be harder for Ableton to individually extract the seperate sounds (depends on the beatboxer). In any case, some combination of frequency and amplitude - more specifically(Attack, Decay, Sustain, Release) as well as perhaps the different overtone combinations that account for differences in timbre are going to be the characteristics that would have to be evaluated in order to separate the kick snare and hihat .
Before this feature existed I used gates and hi/low pass filters to accomplish a similar task. So perhaps Ableton's solution is not as complicated as we might imagine.

Choosing an audio API

I'm struggling to choose between a vast number of audio programming languages and APIs. I'm very (totally) new to audio programming so please bear with me.
Software
I need to be able to:
Alter volume of different sounds before outputting them to anything (these sounds can have a variety of different origins, for example mp3s and microphone input)
phase shift sounds
superimpose sounds that I have tweaked (as per items 1 and 2)
control the output to each of 8 channels independently of one another
make this all happen on Windows7
These capabilities need be abstracted by a graphical frontend I will probably make myself. What I want to be able to do is create 'sound sources' and move them around a 3D environment along either pre-defined trajectories and/or in relation to the movement of whoever is inside the rig. The reason I want to do pitch bending is so I can mess with red-shift stuff.
I don't want to have to construct full tracks before-hand and just play them. I want the sound that is played to depend on external input from sensors as well as what I am doing on the frontend.
As far as I know this means I cant use any existing full audio making app.
The Question
I've been looking around for for the API or language I should use and I have not turned up a blank, quite the opposite actually. I'm struggling to narrow down my search. A lot of my problem stems from the fact that I have no experience in audio programming.
So, does anyone know off-hand of an API or language that meets my criteria?
Hardware stuff and goals
(I left this until last because I'm not sure how relevant it is)
My goal is to make three rings of speakers at different heights and to have enough control over them to be able to simulate any number of 'sound sources' within the array. The idea is to have someone stand in the middle of the rig and be able to make it sound like there are lots of things moving around them. To get this working I'm planning on doing a little trig and using 8 channels of audio from my PC. The maths is pretty straight forward, it just the rest that I need to worry about
What I want to do next is attach a bunch of cameras to the thing and do some simple image recognition stuff to be able to 'attach sound sources' to different objects. Eg. If someone is standing in the right place it can be made to seem as though all red balls quack like a duck, and all orange ones moan hauntingly.
This is not to detract from Richard Small's answer, but to comment on some of the other options out there:
If you are looking for something higher-level with which you can prototype and develop this faster, you want max/msp or it's open source competitor puredata. These are designed for musicians who are technically minded, but not so much for programmers. As a result, you can build this sort of thing quickly and efficiently.
You also have some lower level options: PortAudio can handle your audio I/O, you would have to do the sound generation and effects and so on on your own or with other libraries. Cinder and OpenFramewoks both provide interfaces for audio, cameras, and other stuff for "creative programming". I'm afraid I don't know if they meet your full requirements, but they are powerful and popular for this sort of thing so I encorage you to look at them.
The two major ones these days tend to be
WWise
WWise Download Link
FMOD
FMOD Download Link
These two engines may even in fact be overkill for what you need, but I can almost guarantee that they will be capable of anything you require.

Creating a lava lamp-like animation

I recently saw something that set me wondering how to create a realistic-looking (2D) lava lamp-like animation, for a screen-saver or game.
It would of course be possible to model the lava lamp's physics using partial differential equations, and to translate that into code. However, this is likely to be both quite difficult (because of several factors, not least of which is the inherent irregularity of the geometry of the "blobs" of wax and the high number of variables) and probably computationally far too expensive to calculate in real time.
Analytical solutions, if any could be found, would be similarly useless because you would want to have some degree of randomness (or stochasticity) in the animation.
So, the question is, can anyone think of an approach that would allow you to animate a realistic looking lava lamp, in real time (at say 10-30 FPS), on a typical desktop/laptop computer, without modelling the physics in any great detail? In other words, is there a way to "cheat"?
One way to cheat might be to use a probabilistic cellular automaton with a well-chosen transition table to simulate the motion of the blobs. Some popular screensavers (in particular ParticleFire) use this approach to elegantly simulate complex motions in 2D space by breaking the objects down to individual pixels and then defining the ways in which individual pixels transition by looking at the states of their neighbors. You can get some pretty emergent behavior with simple cellular automata - look at Conway's game of life, for example, or this simulation of a forest fire.
LavaLite is open source. You can get code with the xscreensaver-gl package in most Linux distros. It uses metaballs.

How does pathfinding in RTS video games work?

In a game such as Warcraft 3 or Age of Empires, the ways that an AI opponent can move about the map seem almost limitless. The maps are huge and the position of other players is constantly changing.
How does the AI path-finding in games like these work? Standard graph-search methods (such as DFS, BFS or A*) seem impossible in such a setup.
Take the following with a grain of salt, since I don't have first-person experience with pathfinding.
That being said, there are likely to be different approaches, but I think standard graph-search methods, notably (variants of) A* are perfectly reasonable for strategy games. Most strategy games I know seem to be based on a tile system, where the map is comprised of little squares, which are easily mapped to a graph. One example would be StarCraft II (Screenshot), which I'll keep using as an example in the remainder of this answer, because I'm most familiar with it.
While A* can be used for real-time strategy games, there are a few drawbacks that have to be overcome by tweaks to the core algorithm:
A* is too slow
Since an RTS is by definion "real time", waiting for the computation to finish will frustrate the player, because the units will lag. This can be remedied in several ways. One is to use Multi-tiered A*, which computes a rough course before taking smaller obstacles into account. Another obvious optimization is to group units heading to the same destination into a platoon and only calculate one path for all of them.
Instead of the naive approach of making every single tile a node in the graph, one could also build a navigation mesh, which has fewer nodes and could be searched faster – this requires tweaking the search algorithm a little, but it would still be A* at the core.
A* is static
A* works on a static graph, so what to do when the landscape changes? I don't know how this is done in actual games, but I imagine the pathing is done repeatedly to cope with new obstacles or removed obstacles. Maybe they are using an incremental version of A* (PDF).
To see a demonstration of StarCraft II coping with this, go to 7:50 in this video.
A* has perfect information
A part of many RTS games is unexplored terrain. Since you can't see the terrain, your units shouldn't know where to walk either, but often they do anyway. One approach is to penalize walking through unexplored terrain, so units are more reluctant to take advantage of their omniscience, another is to take the omniscience away and just assume unexplored terrain is walkable. This can result in the units stumbling into dead ends, sometimes ones that are obvious to the player, until they finally explore a path to the target.
Fog of War is another aspect of this. For example, in StarCraft 2 there are destructible obstacles on the map. It has been shown that you can order a unit to move to the enemy base, and it will start down a different path if the obstacle has already been destroyed by your opponent, thus giving you information you should not actually have.
To summarize: You can use standard algorithms, but you may have to use them cleverly. And as a last bonus: I have found Amit’s Game Programming Information interesting with regard to pathing. It also has links to further discussion of the problem.
This is a bit of a simple example, but it shows that you can make the illusion of AI / Indepth Pathfinding from a non-complex set of rules: Pac-Man Pathfinding
Essentially, it is possible for the AI to know local (near by) information and make decisions based on that knowledge.
A* is a common pathfinding algorithm. This is a popular game development topic - you should be able to find numerous books and websites that contain information.
Check out visibility graphs. I believe that is what they use for path finding.

3D Audio Engine

Despite all the advances in 3D graphic engines, it strikes me as odd that the same level of attention hasn't been given to audio. Modern games do real-time rendering of 3D scenes, yet we still get more-or-less pre-canned audio accompanying those scenes.
Imagine - if you will - a 3D engine that models not just the physical appearance of items, but also their audio properties. And from these models it can dynamically generate audio based on the materials that come into contact, their velocity, distance from your virtual ears, etcetera. Now, when you're crouching behind the sandbags with bullets flying over your head, each one will yield a unique and realistic sound.
The obvious application of such a technology would be gaming, but I'm sure there are many other possibilities.
Is such a technology being actively developed? Does anyone know of any projects that attempt to achieve this?
Thanks,
Kent
I once did some research toward improving OpenAL, and the problem with simulating 3D audio is that so many of the cues that your mind uses — the slightly different attenuation at various angles, the frequency difference between sounds in front of you and those behind you — are quite specific to your own head and are not quite the same for anyone else!
If you want, say, a pair of headphones to really make it sound like a creature is in the leaves ahead and in front of the character in a game, then you actually have to take that player into a studio, measure how their own particular ears and head change the amplitude and phase of the sound at different distances (amplitude and phase are different, and are both quite important to the way your brain processes sound direction), and then teach the game to attenuate and phase-shift the sounds for that particular player.
There do exist "standard heads" that have been mocked up with plastic and used to get generic frequency-response curves for the various directions around the head, but an average or standard will never sound quite right to most players.
Thus the current technology is basically to sell the player five cheap speakers, have them place them around their desk, and then the sounds — while not particularly well reproduced — actually do sound like they're coming from behind or beside the player because, well, they are coming from the speaker behind the player. :-)
But some games do bother to be careful to compute how sound would be muffled and attenuated through walls and doors (which can get difficult to simulate, because the ear receives the same sound at a few milliseconds different delay through various materials and reflective surfaces in the environment, all of which would have to be included if things were to sound realistic). They tend to keep their libraries under wraps, however, so public reference implementations like OpenAL tend to be pretty primitive.
Edit: here is a link to an online data set that I found at the time, that could be used as a starting point for creating a more realistic OpenAL sound field, from MIT:
http://sound.media.mit.edu/resources/KEMAR.html
Enjoy! :-)
Aureal did this back in 1998. I still have one of their cards, although I'd need Windows 98 to run it.
Imagine ray-tracing, but with audio. A game using the Aureal API would provide geometric environment information (e.g. a 3D map) and the audio card would ray-trace sound. It was exactly like hearing real things in the world around you. You could focus your eyes on the sound sources and attend to given sources in a noisy environment.
As I understand it, Creative destroyed Aureal by means of legal expenses in a series of patent infringement claims (which were all rejected).
In the public domain world, OpenAL exists - an audio version of OpenGL. I think development stopped a long time ago. They had a very simple 3D audio approach, no geometry - no better than EAX in software.
EAX 4.0 (and I think there is a later version?) finally - after a decade - I think have incoporated some of the geometric information ray-tracing approach Aureal used (Creative bought up their IP after they folded).
The Source (Half-Life 2) engine on the SoundBlaster X-Fi already does this.
It really is something to hear. You can definitely hear the difference between an echo against concrete vs wood vs glass, etc...
A little known side area is voip. While games are having actively developed software, you are likely to spent time talking to others while you are gaming as well.
Mumble ( http://mumble.sourceforge.net/ ) is software that uses plugins to determine who is ingame with you. It will then position its audio in a 360 degree area around you, so the left is to the left, behind you sounds like as such. This made a creepily realistic addition, and while trying it out it led to funny games of "marko, polo".
Audio took a massive back turn in vista, where hardware was not allowed to be used to accelerate it anymore. This killed EAX as it was in the XP days. Software wrappers are gradually getting built now.
Very interesting field indeed. So interesting, that I'm going to do my master's degree thesis on this subject. In particular, it's use in first person shooters.
My literature research so far has made it clear that this particular field has little theoretical background. Not a lot of research has been done in this field, and most theory is based on movie-audio theory.
As for practical applications, I haven't found any so far. Of course, there are plenty titles and packages which support real-time audio-effect processing and apply them depending on the general surroundings of the auditor. e.g.: auditor enters a hall, so a echo/reverb effect is applied on the sound samples. This is rather crude. An analogy for visuals would be to subtract 20% of the RGB-value of the entire image when someone turns off (or shoots ;) ) one of five lightbulbs in the room. It's a start, but not very realisic at all.
The best work I found was a (2007) PhD thesis by Mark Nicholas Grimshaw, University of Waikato , called The Accoustic Ecology of the First-Person Shooter
This huge pager proposes a theoretical setup for such an engine, as well as formulating a wealth of taxonomies and terms for analysing game-audio. Also he argues that the importance of audio for first person shooters is greatly overlooked, as audio is a powerful force for emergence into the game world.
Just think about it. Imagine playing a game on a monitor with no sound but picture perfect graphics. Next, imagine hearing game realisic (game) sounds all around you, while closing your eyes. The latter will give you a much greater sense of 'being there'.
So why haven't game developers dove into this full-hearted already? I think the answer to that is clear: it's much harder to sell. Improved images is easy to sell: you just give a picture or movie and it's easy to see how much prettier it is. It's even easily quantifyable (e.g. more pixels=better picture). For sound it's not so easy. Realism in sound is much more sub-conscious, and therefor harder to market.
The effects the real world has on sounds are subconsciously percieved. Most people never even notice most of them. Some of these effects cannot even conciously be heard. Still, they all play a part in the percieved realism of the sound. There is an easy experiment you can do yourself which illustrates this. Next time you're walking on the sidewalk, listen carefully to the background sounds of the enviroment: wind blowing through leaves, all the cars on distant roads, etc.. Then, listen to how this sound changes when you walk nearer or further from a wall, or when you walk under an overhanging balcony, or when you pass an open door even. Do it, listen carefully, and you'll notice a big difference in sound. Probably much bigger than you ever remembered.
In a game world, these type of changes aren't reflected. And even though you don't (yet) consciously miss them, your subconsciously do, and this will have a negative effect on your level of emergence.
So, how good does audio have to be in comparison to the image? More practical: which physical effects in the real world contribute the most to the percieved realism. Does this percieved realism depend on the sound and/or the situation? These are the questions I wish to answer with my research. After that, my idea is to design a practical framework for an audio engine which could variably apply some effects to some or all game audio, depending (dynamically) on the amount of available computing power. Yup, I'm setting the bar pretty high :)
I'll be starting per September 2009. If anyone's interested, I'm thinking about setting up a blog to share my progress and findings.
Janne Louw
(BSc Computer Sciences Universiteit Leiden, The Netherlands)

Resources