Choosing an audio API - audio

I'm struggling to choose between a vast number of audio programming languages and APIs. I'm very (totally) new to audio programming so please bear with me.
Software
I need to be able to:
Alter volume of different sounds before outputting them to anything (these sounds can have a variety of different origins, for example mp3s and microphone input)
phase shift sounds
superimpose sounds that I have tweaked (as per items 1 and 2)
control the output to each of 8 channels independently of one another
make this all happen on Windows7
These capabilities need be abstracted by a graphical frontend I will probably make myself. What I want to be able to do is create 'sound sources' and move them around a 3D environment along either pre-defined trajectories and/or in relation to the movement of whoever is inside the rig. The reason I want to do pitch bending is so I can mess with red-shift stuff.
I don't want to have to construct full tracks before-hand and just play them. I want the sound that is played to depend on external input from sensors as well as what I am doing on the frontend.
As far as I know this means I cant use any existing full audio making app.
The Question
I've been looking around for for the API or language I should use and I have not turned up a blank, quite the opposite actually. I'm struggling to narrow down my search. A lot of my problem stems from the fact that I have no experience in audio programming.
So, does anyone know off-hand of an API or language that meets my criteria?
Hardware stuff and goals
(I left this until last because I'm not sure how relevant it is)
My goal is to make three rings of speakers at different heights and to have enough control over them to be able to simulate any number of 'sound sources' within the array. The idea is to have someone stand in the middle of the rig and be able to make it sound like there are lots of things moving around them. To get this working I'm planning on doing a little trig and using 8 channels of audio from my PC. The maths is pretty straight forward, it just the rest that I need to worry about
What I want to do next is attach a bunch of cameras to the thing and do some simple image recognition stuff to be able to 'attach sound sources' to different objects. Eg. If someone is standing in the right place it can be made to seem as though all red balls quack like a duck, and all orange ones moan hauntingly.

This is not to detract from Richard Small's answer, but to comment on some of the other options out there:
If you are looking for something higher-level with which you can prototype and develop this faster, you want max/msp or it's open source competitor puredata. These are designed for musicians who are technically minded, but not so much for programmers. As a result, you can build this sort of thing quickly and efficiently.
You also have some lower level options: PortAudio can handle your audio I/O, you would have to do the sound generation and effects and so on on your own or with other libraries. Cinder and OpenFramewoks both provide interfaces for audio, cameras, and other stuff for "creative programming". I'm afraid I don't know if they meet your full requirements, but they are powerful and popular for this sort of thing so I encorage you to look at them.

The two major ones these days tend to be
WWise
WWise Download Link
FMOD
FMOD Download Link
These two engines may even in fact be overkill for what you need, but I can almost guarantee that they will be capable of anything you require.

Related

Realtime audio manipulation

Here is what i like to achieve:
I like to play around in creating "new" software / hardware instruments.
Sound processing and creation is always managed by software. But one could play the instrument via ultrasonic distance sensor for example. Another idea is to start playback when someone interrupts the light of a photoelectric barrier and so on....
So the instrument would play common sounds, but has to be used in an unusal way. For example, the ultrasonic instrument would play a sound if it detects something in a certain distance. The sound could be manipiulated in pitch for example if the distance gets smaller.
Basically i like to playback a sound sample and manipualte this in realtime.
I guess i have to use WAV samples for this, right? And which programming language do you think fits best for this task?
Edited after kevins hint: please kick me into the right direction - give me a hint where to start.
Thanks in advance
Since you're using the the Processing tag, you can try Processing.
It comes with a sound library like Minim or you can install beads which is great. There's actually a nice book on it: Sonifying Processing
You might find SuperColider fun as well.
The main thing is what are you comfortable with at the moment ?
If Processing syntax looks intimidating, you can actually try a different programming paradigm like data flow. In which case you can use PureData(free, opensource) or MaxMSP(very similar, but commercial). The idea is rather than typing instructions, you connect boxes with wires which is fun and the examples are great too.
If you're into c++ there are plenty of libraries. On the creative side, there's a nice set of libraries called OpenFrameworks that's easy and fun to use. If this is your cup of tea, have a peek at Maximilian.
Bottomline is: there are multiple options to achieve the same task. Choose the best tool for your (based on your background) or try each and see what you like best.
You asked "And which programming language do you think fits best for this task?" - I would also suggest using Processing. I have been used Processing to work with sounds previously. And in all cases I used Minim. It has many UgenS to generate sounds programmatically.
Also, you wants to integrate with some sensors. I'm not sure what types of sensors you will use, but Processing goes pretty well with different Arduino modules and sensors. Check this link for more direction.
Furthermore, you can export your project as .exe or executable .jar files. And their JS version (P5.js) works almost the same as the Java version.

Realtime Sound Routing...Trigger a Sound with Another Sound

I'm looking for a program that is able to recognize individual audio samples from my computer and reroute them to trigger WAV files from a library. In my project, it would need to be realtime as the latency would not be a desired result. I tried using dictation software that would recognize words to trigger opening a file and that's the direction where I want to go, but instead of words I want it to be sounds and it would happen in realtime. I'm not sure where to go and am just looking for some guidance. Does anyone have any suggestions of what I should do?
That's a fairly broad question, but I can tell you how I would do it. (Hardly the only way, but where I would start.)
If you're looking for real time input, the Java Sound library (excellent tutorial here) allows for that. (Just note that microphone input from a web page is difficult on anything, due to major security concerns, so this would be a desktop application.)
If it needs to be real time, the first thing I would suggest is stream and multithread the hell out of it. I would suggest the Java 8 Stream API, but since you're looking for subsamples that match a specific pattern, then each data point will have to be aware of the state of its neighbors, and that isn't easy with streams.
You will probably want to know if a sound roughly resembles an audio profile, so for that, I would pick a tolerance on just how close you want it to be for a match (remembering that samples may not line up 100% anyway, so "exact" is not an option), and then look up Hidden Markov Models. I suggest these because they're what voice recognition software typically uses, and while your sounds may not be voices, it will give you an idea of what has already been done.
You'll also want to maintain a limited list of audio samples in memory. Specifically, you will likely need the most recent data, because an audio signal is a time-variant signal, and you can't get a match from just one point. I wouldn't make it much longer than the longest sample you're looking to recognize, as audio takes up a boatload of memory.
Lastly (for audio), I would recommend picking a standard format for comparison. Make it as good as gets you decent results, and start high. You will want to convert everything to that format before you compare it.
Once you recognize a specific sound, it's basically a Command Pattern. Specific sounds can be mapped, even with a java.util.HashMap, to specific files, which (if there are few enough) you might even have pre-loaded.
Lastly, it's worth looking at the Java Speech API. It's not part of the JDK and it's quite dated, but you might get some good advice from its implementation.
This is of course the advice of a Java-preferring programmer, but I imagine that there might be some decent libraries in Python and Ruby to help you as well; and of course there's something in C somewhere. This may sound like a lot, but most of the material is already implemented and ready-to-go.
Hopefully this helps, let's look forward to other answers.

Implementing "best match" for sound effects

I am looking for some advice on categorizing a library of sound effects. I have a large set of random sound effects, (think whistles, pops, growls, creaks, gunshots etc). I would like to be able to take a growl for example, and find the next growl that sounds the closest to the original.
Given a sound, what sound from my set sounds the closest to it.
I have done a fair amount of googling and have found two avenues that I am still researching. One is using echonest, although their "best match" support looks not promising for public users. The other option is diving into FFT and building my own matching algorithm. This is a fine option and would be a great learning experience but I wanted to get some opinions from others who might know a little more about sound processing; especially short clips .5sec - 3sec range, not full length music.
Thanks!
I have worked in movie postproduction for years and as far as I know, there is no way to do that automatically. Every file has meta information in its file header which describes what the sound is like. You are then actually not searching for the file names but in the meta string.
I don't think that it is trivial to sort effects programmatically as two effects that sound similar might be totally different if you look at the waveform.
You would need to extract significant information about a sound that you can then compare.
I am also not a DSP expert, maybe there are methods to do this
If you're interested in trying to build your own system to do this, I can suggest a few keywords that might help to refine your Google searches. In the academic research community, the task you're describing is often called "content-based audio searching". I know there's been a lot of work done on it, and though most pertains to music, sound effects have definitely been the focus of a number of studies.
You might want to start with the work of Pedro Cano.
Also, I recently heard about a company that's doing similar work. You might want to check out products from Imagine Research.
Those are just a couple of ideas off the top of my head. I'm not %100 sure they'll be helpful. If they are, please let me know!

Major game components

I am in the process of developping a game, and after two months of work (not full time mind you), I have come to realise that our specs for the game are lacking a lot of details. I am not a professional game developper, this is only a hobby.
What I would like to receive help or advices for is this: What are the major components that you find in games, that have to be developped or already exists as librairies? The objective of this question is for me to be able to specify more game aspects.
Currently, we had specified pretty much only how we would work on the visual, completely forgetting everything about game logic (AI, Entities interactions, Quest logic (how do we decide whether or not a quest is completed)).
So far, I have found those points:
Physics (collision detection, actual forces, etc.)
AI (pathfinding, objectives, etc.)
Model management
Animation management
Scene management
Combat management
Inventory management
Camera (make sure not to render everything that is in the scene)
Heightmaps
Entities communication (Player with NPC, enemy, other players, etc)
Game state
Game state save system
In order to reduce the scope of this queston, I'd like it if you could specifically discuss aspects related to developping an RPG type of game. I will also point out that I am using XNA to develop this game, but I have almost no grasp of all the classes available yet (pretty much only using the Game component with some classes that are related to it such as GameTime, SpriteBatch, GraphicDeviceManager) but not much more.
You have a decent list, but you are missing storage (save load), text (text is important in RPGs : Unicode, font rendering), probably a macro system for text (something that replaces tokens like {player} with the player characters name), and most important of all content generation tools (map editor, chara editor, dialog editor) because RPGs need content (or auto generation tools if you need ). By the way have links to your work?
I do this exact stuff for a living so if you need more pointers perhaps I can help.
I don't know if this is any help, but I have been reading articles from http://www.gamasutra.com/ for many years.
I don't have a perfect set of tools from the beginning, but your list covers most of the usual trouble for RUNNING the game. But have you found out what each one of the items stands for? How much have you made already? "Inventory Management" sounds very heavy, but some games just need a simple "array" of objects. Takes an hour to program + some graphical integration (if you have your GUI Management done already).
How to start planning
When I develop games in my spare time, I usually get an idea because another game lacks this function/option. Then I start up what ever development tool I am currently using and try to see if I can make a prototype showing this idea. It's not always about fancy graphics, but most often it's more about finding out how to solve a certain problem. Green and red boxes will help you most of the way, but otherwise, use Google Images and do a quick search for prototype graphics. But remember that these images are probably copyrighted, so only use them for internal test purposes and to explain to your graphic artists what type of game/graphic you want to make.
Secondly, you'll find that you need to find/build tools to create the "maps/missions/quests" too. Today many develop their own "object script" where they can easily add new content/path to a game.
Many of the ideas we (my friends and I) have been testing started with a certain prototype of the interface, to see if its possible to generate that sort of screen output first. Then we build a quick'n'dirty map/level-editor that can supply us with test maps.
No game logic at this point, still figuring out if the game-engine in general is running.
My first game-algorithm problem
Back when I was in my teens I had a Commodore 64 and I was wondering, how do they sort 10 numbers in order for a Highscore? It took me quite a while to find a "scalable" way of doing this, but I learned a lot about programming too.
The second problem I found
How do I make a tank/cannon fire a bullet in the correct direction when I fly my helicopter around the screen?
I sat down and drew quick sketches of the actual problem, looked at the bullet lines, tried some theories of my own and found something that seemed to be working (by dividing and multiplying positions etc.) later on in school I discovered this to be more or less Pythagoras. LOL!
Years and many game attempts later
I played "Dune" and the later C&C + the new game Warcraft (v1/v2) - I remember it started to annoyed me how the lame AI worked. The path finding algorithms were frustrating for the player, I thought. They moved in direction of target position and then found a wall, but if the way was to complex, the object just stopped. Argh!
So I first sat with large amounts of paper, then I tried to draw certain scenarios where an "object" (tank/ork/soldier) would go from A to B and then suddenly there was a "structure" (building/other object) in the pathway - what then?
I learned about A-star pathfinding (after solving it first on my own in a similar way, then later reading about the reason for this working). A very "cpu heavy" way of finding a path, but I learned a lot from the process of "cracking this nut". These thoughts have helped me a lot developing other game algortimes over time.
So what I am saying is: I think you'll have to think more of:
How is the game to be played?
What does the user experience look like?
Why would the user want to come back to the game?
What requirements are needed? Broadband? 19" monitor with 1280x1024?
An RPG, yes - but will it be multi-user or single?
Do we need a fast network/server setup or do we need to develop a strong AI for the NPCs?
And much more...
I am not sure this is what you asked for, but I hope you can use it somehow?
There are hundreds of components needed to make a game, from time management to audio. You'll probably need to roll your own GUI, as native OS controls are very non-gamey. You will probably also need all kinds of tools to generate your worlds, exporters to convert models and textures into something suitable for your game etc.
I would strongly recommend that you start with one of the many free or cheap game engines that are out there. Loads of them come with the source code, so you can learn how they have been put together as you go.
When you think you are ready, you can start to replace parts of the engine you are using to better suit your needs.
I agree with Robert Gould's post , especially about tools and I'd also add
Scripting
Memory Management
Network - especially replication of game object states and match-making
oh and don't forget Localisation - particularly for text strings
Effects and effect timers (could be magical effects, could just be stuff like being stunned.)
Character professions, skills, spells (if that kind of game).
World creation tools, to make it easy for non-programmer builders.
Think about whether or not you want PvP. If so, you need to really think about how you're going to do your combat system and any limits you want on who can attack whom.
Equipment, "treasure", values of things and how you want to do the economy.
This is an older question, but IMHO now there is a better answer: use Unity (or something akin to it). It gives you 90% of what you need to make a game up front, so you can jump in and focus directly on the part you care about, which is the gameplay. When you run aground because there's something it doesn't do out of the box, you can usually find a resource in the Asset Store for free or cheap that will save you a lot of work.
I would also add that if you're not working on your game full-time, be mindful of the complexity and the time-frame of the task. If you'll try to integrate so many different frameworks into your RPG game, you can easily end up with several years worth of work; maybe it would be more advisable to start small and only develop the "core" of your game first and not bother about physics, for example. You could still add it in the second version.

3D Audio Engine

Despite all the advances in 3D graphic engines, it strikes me as odd that the same level of attention hasn't been given to audio. Modern games do real-time rendering of 3D scenes, yet we still get more-or-less pre-canned audio accompanying those scenes.
Imagine - if you will - a 3D engine that models not just the physical appearance of items, but also their audio properties. And from these models it can dynamically generate audio based on the materials that come into contact, their velocity, distance from your virtual ears, etcetera. Now, when you're crouching behind the sandbags with bullets flying over your head, each one will yield a unique and realistic sound.
The obvious application of such a technology would be gaming, but I'm sure there are many other possibilities.
Is such a technology being actively developed? Does anyone know of any projects that attempt to achieve this?
Thanks,
Kent
I once did some research toward improving OpenAL, and the problem with simulating 3D audio is that so many of the cues that your mind uses — the slightly different attenuation at various angles, the frequency difference between sounds in front of you and those behind you — are quite specific to your own head and are not quite the same for anyone else!
If you want, say, a pair of headphones to really make it sound like a creature is in the leaves ahead and in front of the character in a game, then you actually have to take that player into a studio, measure how their own particular ears and head change the amplitude and phase of the sound at different distances (amplitude and phase are different, and are both quite important to the way your brain processes sound direction), and then teach the game to attenuate and phase-shift the sounds for that particular player.
There do exist "standard heads" that have been mocked up with plastic and used to get generic frequency-response curves for the various directions around the head, but an average or standard will never sound quite right to most players.
Thus the current technology is basically to sell the player five cheap speakers, have them place them around their desk, and then the sounds — while not particularly well reproduced — actually do sound like they're coming from behind or beside the player because, well, they are coming from the speaker behind the player. :-)
But some games do bother to be careful to compute how sound would be muffled and attenuated through walls and doors (which can get difficult to simulate, because the ear receives the same sound at a few milliseconds different delay through various materials and reflective surfaces in the environment, all of which would have to be included if things were to sound realistic). They tend to keep their libraries under wraps, however, so public reference implementations like OpenAL tend to be pretty primitive.
Edit: here is a link to an online data set that I found at the time, that could be used as a starting point for creating a more realistic OpenAL sound field, from MIT:
http://sound.media.mit.edu/resources/KEMAR.html
Enjoy! :-)
Aureal did this back in 1998. I still have one of their cards, although I'd need Windows 98 to run it.
Imagine ray-tracing, but with audio. A game using the Aureal API would provide geometric environment information (e.g. a 3D map) and the audio card would ray-trace sound. It was exactly like hearing real things in the world around you. You could focus your eyes on the sound sources and attend to given sources in a noisy environment.
As I understand it, Creative destroyed Aureal by means of legal expenses in a series of patent infringement claims (which were all rejected).
In the public domain world, OpenAL exists - an audio version of OpenGL. I think development stopped a long time ago. They had a very simple 3D audio approach, no geometry - no better than EAX in software.
EAX 4.0 (and I think there is a later version?) finally - after a decade - I think have incoporated some of the geometric information ray-tracing approach Aureal used (Creative bought up their IP after they folded).
The Source (Half-Life 2) engine on the SoundBlaster X-Fi already does this.
It really is something to hear. You can definitely hear the difference between an echo against concrete vs wood vs glass, etc...
A little known side area is voip. While games are having actively developed software, you are likely to spent time talking to others while you are gaming as well.
Mumble ( http://mumble.sourceforge.net/ ) is software that uses plugins to determine who is ingame with you. It will then position its audio in a 360 degree area around you, so the left is to the left, behind you sounds like as such. This made a creepily realistic addition, and while trying it out it led to funny games of "marko, polo".
Audio took a massive back turn in vista, where hardware was not allowed to be used to accelerate it anymore. This killed EAX as it was in the XP days. Software wrappers are gradually getting built now.
Very interesting field indeed. So interesting, that I'm going to do my master's degree thesis on this subject. In particular, it's use in first person shooters.
My literature research so far has made it clear that this particular field has little theoretical background. Not a lot of research has been done in this field, and most theory is based on movie-audio theory.
As for practical applications, I haven't found any so far. Of course, there are plenty titles and packages which support real-time audio-effect processing and apply them depending on the general surroundings of the auditor. e.g.: auditor enters a hall, so a echo/reverb effect is applied on the sound samples. This is rather crude. An analogy for visuals would be to subtract 20% of the RGB-value of the entire image when someone turns off (or shoots ;) ) one of five lightbulbs in the room. It's a start, but not very realisic at all.
The best work I found was a (2007) PhD thesis by Mark Nicholas Grimshaw, University of Waikato , called The Accoustic Ecology of the First-Person Shooter
This huge pager proposes a theoretical setup for such an engine, as well as formulating a wealth of taxonomies and terms for analysing game-audio. Also he argues that the importance of audio for first person shooters is greatly overlooked, as audio is a powerful force for emergence into the game world.
Just think about it. Imagine playing a game on a monitor with no sound but picture perfect graphics. Next, imagine hearing game realisic (game) sounds all around you, while closing your eyes. The latter will give you a much greater sense of 'being there'.
So why haven't game developers dove into this full-hearted already? I think the answer to that is clear: it's much harder to sell. Improved images is easy to sell: you just give a picture or movie and it's easy to see how much prettier it is. It's even easily quantifyable (e.g. more pixels=better picture). For sound it's not so easy. Realism in sound is much more sub-conscious, and therefor harder to market.
The effects the real world has on sounds are subconsciously percieved. Most people never even notice most of them. Some of these effects cannot even conciously be heard. Still, they all play a part in the percieved realism of the sound. There is an easy experiment you can do yourself which illustrates this. Next time you're walking on the sidewalk, listen carefully to the background sounds of the enviroment: wind blowing through leaves, all the cars on distant roads, etc.. Then, listen to how this sound changes when you walk nearer or further from a wall, or when you walk under an overhanging balcony, or when you pass an open door even. Do it, listen carefully, and you'll notice a big difference in sound. Probably much bigger than you ever remembered.
In a game world, these type of changes aren't reflected. And even though you don't (yet) consciously miss them, your subconsciously do, and this will have a negative effect on your level of emergence.
So, how good does audio have to be in comparison to the image? More practical: which physical effects in the real world contribute the most to the percieved realism. Does this percieved realism depend on the sound and/or the situation? These are the questions I wish to answer with my research. After that, my idea is to design a practical framework for an audio engine which could variably apply some effects to some or all game audio, depending (dynamically) on the amount of available computing power. Yup, I'm setting the bar pretty high :)
I'll be starting per September 2009. If anyone's interested, I'm thinking about setting up a blog to share my progress and findings.
Janne Louw
(BSc Computer Sciences Universiteit Leiden, The Netherlands)

Resources