I am using XNA's SoundEffect class to play sounds in a game project. While the memory management of directly SoundEffect.Play()ing effects is relatively clear to me (SoundEffectInstances are internally created and added to lists, and their memory is removed upon sound stopping), I would like to know what is advised to do when directly creating SoundEffectInstances e.g. to control audio playback (volume etc.) after instance.Play().
Currently, for those sound effects that I need to control after creation I use a special class that in its constuctor creates a new instance by:
instance = LoadEffect(indexInSoundLibrary).CreateInstance();
while in the destructor of that class calls:
instance.Dispose();
In summary, I use a collection of ~400 sound files (just their filename) that are dynamically loaded whenever needed. Does it make sense to keep a number of SoundEffectInstances in memory (e.g. for those audio files in the collection that are typically played by creating a SoundEffectInstance (and not by directly using the SoundEffect.Play())?
Are there any limitations of how many SoundEffectInstances can be kept in memory at a time?
Just for additional information, I use multiple ContentManagers and regularly Unload unused sounds.
Related
First, some relevant background info: I've got a CoreAudio-based low-latency audio processing application that does various mixing and special effects on audio that is coming from an input device on a purpose-dedicated Mac (running the latest version of MacOS) and delivers the results back to one of the Mac's local audio devices.
In order to obtain the best/most reliable low-latency performance, this app is designed to hook in to CoreAudio's low-level audio-rendering callback (via AudioDeviceCreateIOProcID(), AudioDeviceStart(), etc) and every time the callback-function is called (from the CoreAudio's realtime context), it reads the incoming audio frames (e.g. 128 frames, 64 samples per frame), does the necessary math, and writes out the outgoing samples.
This all works quite well, but from everything I've read, Apple's CoreAudio implementation has an unwritten de-facto requirement that all real-time audio operations happen in a single thread. There are good reasons for this which I acknowledge (mainly that outside of SIMD/SSE/AVX instructions, which I already use, almost all of the mechanisms you might employ to co-ordinate parallelized behavior are not real-time-safe and therefore trying to use them would result in intermittently glitchy audio).
However, my co-workers and I are greedy, and nevertheless we'd like to do many more math-operations per sample-buffer than even the fastest single core could reliably execute in the brief time-window that is necessary to avoid audio-underruns and glitching.
My co-worker (who is fairly experienced at real-time audio processing on embedded/purpose-built Linux hardware) tells me that under Linux it is possible for a program to requisition exclusive access for one or more CPU cores, such that the OS will never try to use them for anything else. Once he has done this, he can run "bare metal" style code on that CPU that simply busy-waits/polls on an atomic variable until the "real" audio thread updates it to let the dedicated core know it's time to do its thing; at that point the dedicated core will run its math routines on the input samples and generate its output in a (hopefully) finite amount of time, at which point the "real" audio thread can gather the results (more busy-waiting/polling here) and incorporate them back into the outgoing audio buffer.
My question is, is this approach worth attempting under MacOS/X? (i.e. can a MacOS/X program, even one with root access, convince MacOS to give it exclusive access to some cores, and if so, will big ugly busy-waiting/polling loops on those cores (including the polling-loops necessary to synchronize the CoreAudio callback-thread relative to their input/output requirements) yield results that are reliably real-time enough that you might someday want to use them in front of a paying audience?)
It seems like something that might be possible in principle, but before I spend too much time banging my head against whatever walls might exist there, I'd like some input about whether this is an avenue worth pursuing on this platform.
can a MacOS/X program, even one with root access, convince MacOS to give it exclusive access to some cores
I don't know about that, but you can use as many cores / real-time threads as you want for your calculations, using whatever synchronisation methods you need to make it work, then pass the audio to your IOProc using a lock free ring buffer, like TPCircularBuffer.
But your question reminded me of a new macOS 11/iOS 14 API I've been meaning to try, the Audio Workgroups API (2020 WWDC Video).
My understanding is that this API lets you "bless" your non-IOProc real-time threads with audio real-time thread properties or at least cooperate better with the audio thread.
The documents distinguish between the threads working in parallel (this sounds like your case) and working asynchronously (this sounds like my proposal), I don't know which case is better for you.
I still don't know what happens in practice when you use Audio Workgroups, whether they opt you in to good stuff or opt you out of bad stuff, but if they're not the hammer you're seeking, they may have some useful hammer-like properties.
We have an application that uses waveXXX() and mixerXXX() functions to handle the audio I/O to and from some instruments (think: oscilloscope or electronics rather than musical instruments, not that it much matters). It's finally time to stop deploying it on Windows XP, and move it to Windows 7 and/or 8.
From reading a variety of material on WASAPI, it sounds like the bulk of the application (based on waveXXX() functions) might actually work fine, but the mixer() stuff used to set master output volume, line in volume, and mute the microphone will definitely have to change, and use IAudioEndPointVolume calls instead.
Is it possible to change only the mixerXXX() calls? Is it desirable?
Logically, this application requires exclusive use of its audio endpoints (speaker out, line in). If I want to ensure exclusive access through software, would that force me to rewrite all the waveXXX() code too? (The alternative is to warn users that other audio applications may interfere with this one).
My recommendation:
If you need exclusive access, convert everything to WASAPI
If you are using line-in, convert everything to WASAPI
If you have time, convert everything to WASAPI
If you are strictly only using speaker and microphone in shared mode, replace mixerXXX() with the ISimpleAudioVolume interface (and several other interfaces to get to it), then test whether existing waveXXX() code behaves as you need it to. Then test each time hardware, OS or audio drivers change. Better still, just convert to WASAPI.
In my case, exclusive speaker output is critical - this drives the instrument that generates a related input signal. I guess I don't mind if another application wants to share access to that incoming signal, but logically it is a system that wants an exclusive contract with its audio endpoints.
That exclusivity requires that I obtain an IMMDevice instance for both speaker output and line-in input, Activate() the IAudioClient interface on them and Initialize() both using AUDCLNT_SHAREMODE_EXCLUSIVE (see also this answer).
But have I actually selected line-in by such a process? Probably not. All I can be sure of is annoying any other applications who were previously sharing my endpoints by cutting them off.
Having done this much, it's really not clear what will happen to waveInXXX() calls - maybe they'll take from line-in, maybe from microphone - maybe it depends on how the hardware vendor implements their end of the deal. It's also never been clear to me whether line-in and microphone are always multiplexed (i.e. selectable), always mixed (i.e. you can only simulate selection by muting the other one) or there is no standard one can rely on.
Because of factors like that, it's a gamble not to use WASAPI throughout.
I wanted to start a thread on this. A lot of people are wondering how to do it in a specific context or with a specific language, but I was wondering what's the best strategy in general
I see two main practices :
load small chunks (like 2048 samples) of the file in a buffer. It seems the most straightforward but it involves to use the disk the lot, so I suspect it is not the best.
load all the file in a big buffer. More gentle with the hardrive, but needs a lot of ram if you use several long files. And if your file is very long, or has a lot of channels, I imagine the index variable could get corrupted. For example if it's a 16bit integer maybe it cannot reach the end of the file (or am I paranoid ?)
and I'm thinking about hybrid things, like :
using very big buffers without loading the whole file
store the file in a custom format on hardrive, in a way that it's optimized for accessing it quickly.
So, what do you think, how do you deal with this ?
I don't really care what's the "best", I'm more wondering about the pros and cons of each.
Answering part of my own question (the part about hybrid solutions).
Audacity is using custom BlockFiles format for storing and playback. It encapsulates both the idea of big(-ger than callback) buffers which are around 1Mb and the idea of custom file type (.aup).
"BlockFiles balance two conflicting forces. We can insert and delete audio without excessive copying, and during playback we are guaranteed to get reasonably large chunks of audio with each request to the disk. The smaller the blocks, the more potential disk requests to fetch the same amount of audio data; the larger the blocks, the more copying on insertions and deletions." (from : http://www.aosabook.org/en/audacity.html)
From what I've read, it was primarly designed for speeding up the edition of very long files (for example inserting data at the beginning without having to move everything after).
But for playback of relatively short audio data (< 1 hour) I guess putting everything in RAM is just fine.
I'm looking into making some software that makes the keyboard function like a piano (e.g., the user presses the 'W' key and the speakers play a D note). I'll probably be using OpenAL. I understand the basics of digital audio, but playing real-time audio in response to key presses poses some problems I'm having trouble solving.
Here is the problem: Let's say I have 10 audio buffers, and each buffer holds one second of audio data. If I have to fill buffers before they are played through the speakers, then I would would be filling buffers one or two seconds before they are played. That means that whenever the user tries to play a note, there will be a one or two second delay between pressing the key and the note being played.
How do you get around this problem? Do you just make the buffers as small as possible, and fill them as late as possible? Is there some trick that I am missing?
Most software synthesizers don't use multiple buffers at all.
They just use one single, small ringbuffer that is constantly played.
A high priority thread will as often as possible check the current play-position and fill the free part (e.g. the part that has been played since the last time your thread was running) of the ringbuffer with sound data.
This will give you a constant latency that is only bound by the size of your ring-buffer and the output latency of your soundcard (usually not that much).
You can lower your latency even further:
In case of a new note to be played (e.g. the user has just pressed a key) you check the current play position within the ring-buffer, add some samples for safety, and then re-render the sound data with the new sound-settings applied.
This becomes tricky if you have time-based effects running (delay lines, reverb and so on), but it's doable. Just keep track of the last 10 states of your time based effects every millisecond or so. That'll make it possible to get back 10 milliseconds in time.
With the WinAPI, you can only get so far in terms of latency. Usually you can't get below 40-50ms which is quite nasty. The solution is to implement ASIO support in your app, and make the user run something like Asio4All in the background. This brings the latency down to 5ms but at a cost: other apps can't play sound at the same time.
I know this because I'm a FL Studio user.
The solution is small buffers, filled frequently by a real-time thread. How small you make the buffers (or how full you let the buffer become with a ring-buffer) is constrained by scheduling latency of your operating system. You'll probably find 10ms to be acceptable.
There are some nasty gotchas in here for the uninitiated - particularly with regards to software architecture and thread-safety.
You could try having a look at Juce - which is a cross-platform framework for writing audio software, and in particular - audio plugins such as SoftSynths and effects. It includes software for both sample plug-ins and hosts. It is in the host that issues with threading are mostly dealt with.
I'm writing a MIDlet using LWUIT and images seem to eat up incredible amounts of memory. All the images I use are PNGs and are packed inside the JAR file. I load them using the standard Image.createImage(URL) method. The application has a number of forms and each has a couple of labels an buttons, however I am fairly certain that only the active form is kept in memory (I know it isn't very trustworthy, but Runtime.freeMemory() seems to confirm this).
The application has worked well in 240x320 resolution, but moving it to 480x640 and using appropriately larger images for UI started causing out of memory errors to show up. What the application does, among other things, is download remote images. The application seems to work fine until it gets to this point. After downloading a couple of PNGs and returning to the main menu, the out of memory error is encountered. Naturally, I looked into the amount of memory the main menu uses and it was pretty shocking. It's just two labels with images and four buttons. Each button has three images used for style.setIcon, setPressedIcon and setRolloverIcon. Images range in size from 15 to 25KB but removing two of the three images used for every button (so 8 images in total), Runtime.freeMemory() showed a stunning 1MB decrease in memory usage.
The way I see it, I either have a whole lot of memory leaks (which I don't think I do, but memory leaks aren't exactly known to be easily tracked down), I am doing something terribly wrong with image handling or there's really no problem involved and I just need to scale down.
If anyone has any insight to offer, I would greatly appreciate it.
Mobile devices are usually very low on memory. So you have to use some tricks to conserve and use memory.
We had the same problem at a project of ours and we solved it like this.
for downloaded images:
Make a cache where you put your images. If you need an image, check if it is in the cachemap, if it isn't download it and put it there, if it is, use it. if memory is full, remove the oldest image in the cachemap and try again.
for other resource images:
keep them in memory only for as long as you can see them, if you can't see them, break the reference and the gc will do the cleanup for you.
Hope this helps.
There are a few things that might be happening here:
You might have seen the memory used before garbage collection, which doesn't correspond to the actual memory used by your app.
Some third party code you are running might be pooling some internal datastructures to minimize allocation. While pooling is a viable strategy, sometimes it does look like a leak. In that case, look if there is API to 'close' or 'dispose' the objects you don't need.
Finally, you might really have a leak. In this case you need to get more details on what's going on in the emulator VM (though keep in mind that it is not necessarily the same as the phone VM).
Make sure that your emulator uses JRE 1.6 as backing JVM. If you need it to use the runtime libraries from erlyer JDK, use -Xbootclasspath:<path-to-rt.jar>.
Then, after your application gets in the state you want to see, do %JAVA_HOME%\bin\jmap -dump:format=b,file=heap.bin <pid> (if you don't know the id of your process, use jps)
Now you've got a dump of the JVM heap. You can analyze it with jhat (comes with the JDK, a bit difficult to use) or some third party profilers (my preference is YourKit - it's commercial, but they have time-limited eval licenses)
I had a similar problem with LWUIT at Java DTV. Did you try flushing the images when you don't need them anymore (getAWTImage().flush())?
Use EncodedImage and resource files when possible (resource files use EncodedImage by default. Read the javadoc for such. Other comments are also correct that you need to actually observe the amount of memory, even high RAM Android/iOS devices run out of memory pretty fast with multiple images.
Avoid scaling which effectively eliminates the EncodedImage.
Did you think of the fact, that maybe loading the same image from JAR, many times, is causing many separate image objects (with identical contents) to be created instead of reusing one instance per-individual-image? This is my first guess.