How do media timeline apps work under the hood? - audio

There are numerous examples of apps that are time-based and send out events or carry out complex processing to a very fine resolution and accuracy. Think MIDI sequencing apps, audio and video editing apps.
So I'm curious, at their core level, how do these apps do what they do so accurately from a programming point of view?

MIDI and media playback are entirely different in nature, and are handled in different ways.
For MIDI, there is very little data to process. A thread with a high priority is created to handle MIDI I/O. That's all that is needed.
For audio, accuracy isn't a problem but latency is. There is a buffer on the sound interface that is regularly written to by the software playing back audio. For a typical media player, this buffer has storage for about 300ms of audio. The software just writes the PCM-encoded audio waveform to the buffer. The sound interface is constantly reading from this buffer and playing back at a constant rate.
For low-latency audio applications, this buffer size can be very small, handling as little as 5 or 10ms of audio. The software generating the audio data must again be handled by a thread with high priority, and often has many optimizations that keep it running in the event the rest of the software (effects and what not) cannot keep up. Buffer underruns are common. Special drivers are often used to skip unneeded software in the signal chain. ASIO and DirectX are common on Windows. Windows Vista/7 and OSX both call their audio APIs "core audio", and provide low-latency features without special drivers.
Video is an entirely different beast. Decoding the video is handled by hardware, where possible. This is how a slow device such as a cell phone is able to playback 720p video. If the hardware can handle the codec, the software just needs to send it the data. In cases where the codec is unsupported, the video must be decoded in software which is much slower. Even on modern PCs, software decoding often leads to choppy or laggy video.
Synchronization of audio to video is also a problem. I don't know much about it, but it is my understanding that the audio is the master clock, and video is synchronized to it. You cannot simply start playback and expect the timing to work out, as different sound interfaces will have different ideas as to what 44.1kHz (or any other sample rate) is. You can prove this yourself by playing back the same audio on two different devices simultaneously, and listening to them drift apart over time.

Related

Concurrent Recording and Playback of Video on an SBC (RasPi Zero or C.H.I.P.)

I'm looking to judge the feasibility of simultaneous recording and playback of two 720p A/V streams using an inexpensive Single-Board Computer. To clarify, I want to present a prerecorded 720p video to a user while simultaneously recording a 720p video of their reaction to it.
I've designed a solution based on specialised video processors / encoders (F1C100 and NT96632BG) and multiple multiplexed flash memory banks, but implementing it would greatly exceed the project's up-front development budget.
The only off-the-shelf hardware I know of that might do the trick and fall within both target BOM and development costs would be an SBC.
Any help judging feasibility (and if possible, implementation suggestions) would be most appreciated.

Which microcontroller for fast high quality audio switching and playback

I'm building a device which will play high quality sound samples and will switch between samples in >5ms when a signal is applied.
I'm after a microcontroller which can allow this - I need 4 I/O pins for triggering the transistions between sounds, as well as the output pin(s) for the audio. The duration of the audio files will be 50ms or so but ideally would have enough storage to allow the files to be 1 second or longer. It will loop the current file until told to change. I don't want audiable pops or suchlike when switching files or running other commands - but there shouldn't be a need for anything complex to run beside it, it's purely audio playing and switching.
I've looked at various microcontrollers in the arduino family but they don't seem optimal for this purpose - (tried for example mozzi library for arduino but it's not fantastic quality). Ideally I could do it all on the chip (whatever it is, doesn't need to be arduino) - without needing external storage or RAM modules. But if that's neccessary I'll do it. The solution is to fit in a 2cm wide cylinder (but no length constraints) so would be ideally within that - so no SD card modules or whatever. Language wise - I'm fairly new to them all - but can learn whatever would be best.
Audio - (44.1kHz CD quality WAV, although could obviously switch to a different format if neccessary). If this is totally impossible to play such a high quality sound - then sound quality could be less.
Thank you for your help
For a simple application like this you would be best to just use a small ARM Cortex M device hooked up to an external SPI FLASH chip. Most microcontrollers scale processing power and RAM with FLASH storage so keeping it all on one chip will result in a grotesquely over-powered solution. Serial FLASH memory is very cheap, easy to use, and you can change the size in the future if you need to add more samples.
For the audio side if you really want CD quality you'll have to look at getting a external audio DAC as I don't know of any microcontrollers that integrate a CD quality codec. External DACs aren't expensive or complex to use, but just adds to the physical size and BOM cost. Many Cortex chips have built in 12-bit DACs though so if the audio has a reasonably small dynamic range you might find this is suitable for your needs.
In terms of minimising pops and clicks the Cortex devices will have enough power for some basic filtering to deal with this. I would recommend against Arduino though as you will quickly come up against processing power limitations and I doubt you will want to dive into assembler optimisations.

audio playback and multitasking

I have been wondering how an operating system can play audio continuously with no interruptions while other programs are running.
Sometimes when intensive program is run i can notice the music stops for a second, maybe a few at times.
But generally I can listen to music while playing an intensive game.
How does the operating system manage to play it with (almost) no interruptions at all?
It gives a higher priority to audio threads.
Android SDK says that normal applications can not change to this (higher) priority: http://developer.android.com/reference/android/os/Process.html#THREAD_PRIORITY_AUDIO
Audio is played back by hardware. The software, including the OS, is responsible for transferring chunks of audio, called "buffers" to the hardware at semi-regular intervals. As long as the hardware receives the next chunk before the last chunk runs out, playback is continuous.
I have a diagram of this on my talk slides:
http://blog.bjornroche.com/2011/11/slides-from-fundamentals-of-audio.html

Making a real-time audio application with software synthesizers

I'm looking into making some software that makes the keyboard function like a piano (e.g., the user presses the 'W' key and the speakers play a D note). I'll probably be using OpenAL. I understand the basics of digital audio, but playing real-time audio in response to key presses poses some problems I'm having trouble solving.
Here is the problem: Let's say I have 10 audio buffers, and each buffer holds one second of audio data. If I have to fill buffers before they are played through the speakers, then I would would be filling buffers one or two seconds before they are played. That means that whenever the user tries to play a note, there will be a one or two second delay between pressing the key and the note being played.
How do you get around this problem? Do you just make the buffers as small as possible, and fill them as late as possible? Is there some trick that I am missing?
Most software synthesizers don't use multiple buffers at all.
They just use one single, small ringbuffer that is constantly played.
A high priority thread will as often as possible check the current play-position and fill the free part (e.g. the part that has been played since the last time your thread was running) of the ringbuffer with sound data.
This will give you a constant latency that is only bound by the size of your ring-buffer and the output latency of your soundcard (usually not that much).
You can lower your latency even further:
In case of a new note to be played (e.g. the user has just pressed a key) you check the current play position within the ring-buffer, add some samples for safety, and then re-render the sound data with the new sound-settings applied.
This becomes tricky if you have time-based effects running (delay lines, reverb and so on), but it's doable. Just keep track of the last 10 states of your time based effects every millisecond or so. That'll make it possible to get back 10 milliseconds in time.
With the WinAPI, you can only get so far in terms of latency. Usually you can't get below 40-50ms which is quite nasty. The solution is to implement ASIO support in your app, and make the user run something like Asio4All in the background. This brings the latency down to 5ms but at a cost: other apps can't play sound at the same time.
I know this because I'm a FL Studio user.
The solution is small buffers, filled frequently by a real-time thread. How small you make the buffers (or how full you let the buffer become with a ring-buffer) is constrained by scheduling latency of your operating system. You'll probably find 10ms to be acceptable.
There are some nasty gotchas in here for the uninitiated - particularly with regards to software architecture and thread-safety.
You could try having a look at Juce - which is a cross-platform framework for writing audio software, and in particular - audio plugins such as SoftSynths and effects. It includes software for both sample plug-ins and hosts. It is in the host that issues with threading are mostly dealt with.

Fast Audio Input/Output

Here's what I want to do:
I want to allow the user to give my program some sound data (through a mic input), then hold it for 250ms, then output it back out through the speakers.
I have done this already using Java Sound API. The problem is that it's sorta slow. It takes a minimum of about 1-2 seconds from the time the sound is made to the time the sound is heard again from the speakers, and I haven't even tried to implement delay logic yet. Theoretically there should be no delay, but there is. I understand that you have to wait for the sound card to fill up its buffer or whatever, and the sample size and sampling rate have something to do with this.
My question is this: Should I continue down the Java path trying to do this? I want to get the delay down to like 100ms if possible. Does anyone have experience using the ASIO driver with Java? Supposedly it's faster..
Also, I'm a .NET guy. Does this make sense to do with .NET instead? What about C++? I'm looking for the right technology to use here, and maybe a good example of how to read/write to audio input/output streams using your suggested technology platform. Thanks for your help!
I've used JavaSound in the past and found it wonderfully flaky (and it keeps changing between VM releases). If you like C#, use it, just use the DirectX APIs. Here's an example of doing kind of what you want to do using DirectSound and C#. You could use the Effects plugins to perform your 250 ms echo.
http://blogs.microsoft.co.il/blogs/tamir/archive/2008/12/25/capturing-and-streaming-sound-by-using-directsound-with-c.aspx
You may want to look into JACK, an audio API designed for low-latency sound processing. Additionally, Google turns up this nifty presentation [PDF] about using JACK with Java.
Theoretically there should be no delay, but there is.
Well, it's impossible to have zero delay. The best you can hope for is an unnoticeable delay (in terms of human perception). It might help if you describe your basic algorithm for reading & writing the sound data, so people can identify possible problems.
A potential issue with using a garbage-collected language like Java is that the GC will periodically run, interrupting your processing for some arbitrary amount of time. However, I'd be surprised if it's >100ms in normal usage. If GC is a problem, most JVMs provide alternate collection algorithms you can try.
If you choose to go down the C/C++ path, I highly recommend using PortAudio ( http://portaudio.com/ ). It works with almost everything on multiple platforms and it gives you low-level control of the sound drivers without actually having to deal with the various sound driver technology that is around.
I've used PortAudio on multiple projects, and it is a real joy to use. And the license is permissive.
If low latency is your goal, you can't beat C.
libsoundio is a low-level C library for real-time audio input and output. It even comes with an example program that does exactly what you want - piping the microphone input to the speakers output.
It's possible with JavaSound to get end-to-end latency in the ballpark of 100-150ms.
The primary cause of latency is the buffer sizes of the capture and playback lines. The bufferSize is set when opening the lines:
capture: TargetDataLine#open(AudioFormat format, int bufferSize)
playback: SourceDataLine#open(AudioFormat format, int bufferSize)
If the buffer is too big it will cause excess latency, but if it's too small it will cause stuttery playback. So you need to find a balance for your applications needs and your computing power.
The default buffer size can be checked with DataLine#getBufferSize when calling #open(AudioFormat format). The default size will vary based on the AudioFormat and seems to be geared for high latency, stutter free playback applications (e.g. internet streaming). If you're developing a low latency application, the default buffer size is much too large and should be changed.
In my testing with a 16-bit PCM AudioFormat, a buffer size of 1024 bytes has been pretty close to ideal for low latency.
The second and often overlooked cause of audio latency is any other activity being done in the capture or playback threads. For example, logging messages to console can introduce 10's of ms of latency. Turn it off.

Resources