ALSA vs PulseAudio - Latency Concerns - linux

Good day,
I have been debating some details with a colleague about ALSA vs PulseAudio, and need some help coming to a conclusion with it. It's to my understanding that ALSA is relatively low-level, and talks directly to the hardware, while PulseAudio sits on top of ALSA as a service.
Additionally, it's to my understanding that ALSA is tied to Linux, but PulseAudio just acts as an abstraction layer on top of ALSA, and can work on other platforms. My conclusion is that ALSA would provide lower audio latency on most Linux systems, while my colleague contends that PulseAudio provides better (shorter) latency regardless.
Which of us is correct? My reasoning is that since PulseAudio sits on top of ALSA or even wraps it, there's no way it could provide better latency unless it's providing its own low-level calls.
Thank you.

ALSA (like many other sound APIs) provides a ring buffer for samples to be played.
The most common way to use this ring buffer is to keep it filled at all times.
This implies that a sample that is written now is played only after all the other samples in the buffer have been played, i.e., the latency is proportional to the buffer's size.
(The buffer size can be chosen by the application, but depends on the capabilities of the hardware, and is fixed once chosen.)
PulseAudio has the ability to keep only a part of the buffer filled.
(This is not a feature offered directly by ALSA, but requires a separate timer to monitor the playback progress.)
Thus it can offer lower latency than other applications using the same buffer size, but more importantly, this allows it to adjust the latency dynamically without having to stop and reconfigure the device.
Other applications could do the same, but it's easier to use PulseAudio than to implement that buffer handling again.

Related

Which microcontroller for fast high quality audio switching and playback

I'm building a device which will play high quality sound samples and will switch between samples in >5ms when a signal is applied.
I'm after a microcontroller which can allow this - I need 4 I/O pins for triggering the transistions between sounds, as well as the output pin(s) for the audio. The duration of the audio files will be 50ms or so but ideally would have enough storage to allow the files to be 1 second or longer. It will loop the current file until told to change. I don't want audiable pops or suchlike when switching files or running other commands - but there shouldn't be a need for anything complex to run beside it, it's purely audio playing and switching.
I've looked at various microcontrollers in the arduino family but they don't seem optimal for this purpose - (tried for example mozzi library for arduino but it's not fantastic quality). Ideally I could do it all on the chip (whatever it is, doesn't need to be arduino) - without needing external storage or RAM modules. But if that's neccessary I'll do it. The solution is to fit in a 2cm wide cylinder (but no length constraints) so would be ideally within that - so no SD card modules or whatever. Language wise - I'm fairly new to them all - but can learn whatever would be best.
Audio - (44.1kHz CD quality WAV, although could obviously switch to a different format if neccessary). If this is totally impossible to play such a high quality sound - then sound quality could be less.
Thank you for your help
For a simple application like this you would be best to just use a small ARM Cortex M device hooked up to an external SPI FLASH chip. Most microcontrollers scale processing power and RAM with FLASH storage so keeping it all on one chip will result in a grotesquely over-powered solution. Serial FLASH memory is very cheap, easy to use, and you can change the size in the future if you need to add more samples.
For the audio side if you really want CD quality you'll have to look at getting a external audio DAC as I don't know of any microcontrollers that integrate a CD quality codec. External DACs aren't expensive or complex to use, but just adds to the physical size and BOM cost. Many Cortex chips have built in 12-bit DACs though so if the audio has a reasonably small dynamic range you might find this is suitable for your needs.
In terms of minimising pops and clicks the Cortex devices will have enough power for some basic filtering to deal with this. I would recommend against Arduino though as you will quickly come up against processing power limitations and I doubt you will want to dive into assembler optimisations.

Upgrading audio code to new WASAPI standard

We have an application that uses waveXXX() and mixerXXX() functions to handle the audio I/O to and from some instruments (think: oscilloscope or electronics rather than musical instruments, not that it much matters). It's finally time to stop deploying it on Windows XP, and move it to Windows 7 and/or 8.
From reading a variety of material on WASAPI, it sounds like the bulk of the application (based on waveXXX() functions) might actually work fine, but the mixer() stuff used to set master output volume, line in volume, and mute the microphone will definitely have to change, and use IAudioEndPointVolume calls instead.
Is it possible to change only the mixerXXX() calls? Is it desirable?
Logically, this application requires exclusive use of its audio endpoints (speaker out, line in). If I want to ensure exclusive access through software, would that force me to rewrite all the waveXXX() code too? (The alternative is to warn users that other audio applications may interfere with this one).
My recommendation:
If you need exclusive access, convert everything to WASAPI
If you are using line-in, convert everything to WASAPI
If you have time, convert everything to WASAPI
If you are strictly only using speaker and microphone in shared mode, replace mixerXXX() with the ISimpleAudioVolume interface (and several other interfaces to get to it), then test whether existing waveXXX() code behaves as you need it to. Then test each time hardware, OS or audio drivers change. Better still, just convert to WASAPI.
In my case, exclusive speaker output is critical - this drives the instrument that generates a related input signal. I guess I don't mind if another application wants to share access to that incoming signal, but logically it is a system that wants an exclusive contract with its audio endpoints.
That exclusivity requires that I obtain an IMMDevice instance for both speaker output and line-in input, Activate() the IAudioClient interface on them and Initialize() both using AUDCLNT_SHAREMODE_EXCLUSIVE (see also this answer).
But have I actually selected line-in by such a process? Probably not. All I can be sure of is annoying any other applications who were previously sharing my endpoints by cutting them off.
Having done this much, it's really not clear what will happen to waveInXXX() calls - maybe they'll take from line-in, maybe from microphone - maybe it depends on how the hardware vendor implements their end of the deal. It's also never been clear to me whether line-in and microphone are always multiplexed (i.e. selectable), always mixed (i.e. you can only simulate selection by muting the other one) or there is no standard one can rely on.
Because of factors like that, it's a gamble not to use WASAPI throughout.

Getting ARM/WM8350 audio and power management working in linux

I have a rooted Sony prs900, running a linux 2.6.23 #2 PREEMPT kernel, for ARMv6. (Montavista linux kernel). I'm having problems with figuring out how power management works, both for running the system and for powering up and down the audio port.
I can neither figure out how to read the battery/powerline status information, nor get the audio chip to play sound, etc ... although I have been studying the kernel modules for a while...
It's worth a little money for help, say $100 paypal donation to an email account, (or more if this takes a long time...) for the first person able to explain to me how to do them in a way that works.
Eg: read battery status, and change some power modes like getting the audio amplifiers to power up/down so that the audio played to /dev/dsp (oss emulation) actually comes out as sound rather than just being consumed by the chip and ignored...
The actual sony kernel, and binary packages of cross compiler tools are located on the main page. Actual kernel sourcecode is also available.
What I have learned so far myself :
The sony is using a wolfson micro WM8350 audio driver and battery charger/power management chip for all the system's power; eg: it can power down/up the SD memory cards, send more power to the cpu, power up audio amplifiers, etc. See: WM8350 Datasheet.
Pretty much, the whole problem revolves around getting the WM8350 kernel drivers to work...
Although the company brags quite a bit about it's support under linux, they don't have any application notes or examples that are actually helpful that I can find, other than the datasheet. I suspect the kernel drivers I have are beta code, because they don't seem to be behaving well (several error messages in the kernel log about wm8350 registers not being readable happen at every boot even when running only the sony's native software...).
The kernel driver's source-code of most interest are in: linux-2.6.23_091126/drivers/mxc/pmic/{core,wm8350}
Notice, the wm8350 is a competitor to the MC14783, but the linux kernel drivers use the same {core} driver source code for both chips; The sony ONLY has the wm8350 on it -- there is no MC14783 present.
The code that I most want most desperately to understand how to make operate is found in the subdirectory {wm8350}, eg: wm8350/wm8350pm/power_supply_sysfs.c.
I want the audio to fire up too, but 'm not quite sure where the pertinent audio amplifier code is yet...
Very clearly the wm8350pm code is designed to export a /sys directory interface; right now /sys is mounted and operational on the system; but I'm not very familiar with the semantics of these newer style interfaces... they aren't quite like the old APM power interfaces for Linux laptops...
First I checked the obvious:
If I do a "cat /sys/power/state" it returns the word "mem" and nothing else.
The file has permissions -rw-r--r--, so potentially it could be written -- but I don't know with what. The string "mem" does not exist anywhere in the source code for the wm8350pm drivers, so I don't even know if /sys/power/state is part of the source code.
Doing a find /sys -iname "wm8350" reveals a handful of directories with the patterns:
wm8350-rtc , wm8350-pmic , wm8350-bl , wm8350-power , wm8350-led
wm8350-hifi-dai , wm8350-codec
wm8350-imx32ads.0
So, I do an ls-l on each directory, and look for actual files rather than symbolic links or subdirectories, and what I find are stock useless writable files: bind, unbind, uevent,
and a very few read only files: pmic_reg, dapm_widget, modalias, codec_reg which aren't very helpful.
It's no surprise that:
Doing: cat /sys/devices/platform/wm8350-ebx5016-audi/modalias gives "wm8350-ebx5016-audio"
Doing: cat /sys/devices/platform/wm8350-imx32ads.0/modalias gives "wm8350-imx32ads"
and since audio is off... Doing: cat /sys/devices/platform/wm8350-ebx5016-audi/dapm_widget reveals the audio state:
Headphone Jack: Off
Line In Jack: On
Mic Bias: Off
Left DAC: Off
Right DAC: Off
... (all else off and omitted except )...
EBX5016-hifi: PM State: D3hot
The last two files, I expect should do wm8350 chip register dumps... and one did.
Doing: cat /sys/devices/wm8350-pmic/pmic-reg causes a long pause, then nothing is printed.
but:
Doing: cat /sys/devices/wm8350/platform/wm8350-ebx5016-audi/wm8350-codec/codec_reg does prints a list of registers up to e8 which is just a few bytes larger than the datasheet says the chip should be (0x00 to 0xe6).
I tried using a python program to play wav files, (works on my desktop computer), and I noticed that /dev/dsp does open, the mixers DO set volume levels, and nothing comes out. So -- the audio driver is not able to enable the sound amplifiers on it's own automatically.
There are no alsa sound files in /dev, nor are any alsa tools found on the embedded machine... so I assume Sony is strictly using OSS /dev/dsp and /dev/mixer.
There is only one other access point I can find to the ws8350:
There IS a device driver /dev/wm8350.
That driver created by the source code in subdirectory wm8350/wm8350_reg.c ; in theory it should be able to read and write to all registers using ioctls() calls from a user space. However, something appears grossly wrong with it, for I wrote a test program to read the wm8350 registers... and most of the registers return error messages rather than allowing to be read, including the most pulic ID registers (0x00, 0x01) etc.
So, I'm quite stuck. Pointers, thoughts, hints, are quite desired.
I would like to change your question a little bit.
How does Linux ASOC (alsa system on chip) power management work?
I will answer this and then give some hints on using this specific chip.
.. If I do a cat /sys/power/state it returns the word "mem" and nothing else. The file has permissions -rw-r--r--, so potentially it could be written -- but I don't know with what. The string "mem" does not exist anywhere in the source code for the wm8350pm drivers, so I don't even know if /sys/power/state is part of the source code.
You need to get an understanding of the Linux driver model. Hardware in Linux is structured like a tree. The rational is that things must be powered up/down in specific sequences. For instance, you should not power down the PCI bus controller before powering down the PCI peripherals. Linux builds a tree of hardware and each driver (code) and device (data/actual hardware) has specific call backs/function pointers which handle some specific tasks.
probe - Are you there? Determines actual hardware/device is present.
remove - Shuts down device. Module removal, power off, etc.
suspend - going to sleep.
resume - waking up.
Three and four may look interesting to you. Now, to read about what /sys/power/state is about. The text mem, means that suspend to memory is supported by your system. In this mode, Linux does these steps,
Find first lowest level active bus.
suspend devices on that bus.
suspend bus and de-activate.
If a bus is active go to step 1.
Set CPU to low power state (suspend to RAM).
This is not quite the full story. A few devices may support a wake-up. They will have extra call-backs to enable waking the system from sleep modes. Read the documentation to find out about this.
That is general power management and driver/device structure. Now, how is the ASOC (alsa system on chip) structured?
There are typically three drivers/devices that get stitched together.
Codec - The wm8350 in your case. This includes audio amplifier drive circuitry and can include sound mixing and source controls. Supports digital to analog and analog to digital, typically through an i2s interface. The i2s is not the only interface. Usually a register bank is controlled through a secondary interface; i2c in the wm8350 case.
DAI - Refer to chapter 1.2.18.1 of the iMx31 reference manual; the hardware is called the SSI by Freescale. The next chapter on the AUDMUX is also useful to understand audio support on the iMx31/32.
Machine file - this is the board specific routing. It hooks the DAI to the codec and is the parent of both. It provides board clocking information and other specific configuration. For instance, it may use the AUDMUX to route the physical pins to the SSI block.
An i2c (or SPI) interface from the codec driver to send control commands to the coded chip. Some chips might uses a wacky i2s interface or something else for control (but not in your case).
Now if you understood this, you will see that some features of the wm8350 seem to break the Linux model. The DAI interface can be stopped (digital audio), but the i2c interface must remain alive to program the registers related to the power functionality in the codec/PMIC (power management IC).
The latest WM8350 calls the IC a multi-function device and support was introduced in 2.6.35. The initial support may not have included the WM8350 features. Unfortunately, without some details on the layout of the Sony prs900 board, it would be difficult to know how to use the WM8350 PMIC functionality. The code will involve the iMx31 CPU, the WM8350, the i2c connection, and possibly some power supply circuitry.
For certain, you can just try echo mem > /sys/power/state and see what happens. If it works, you are lucky. The power/current consumption in sleep might not be optimal, but it will probably be hard to fix with the 2.6.23 kernel. You will want to look through the /sys directories for wake-up sources and possibly register these before issuing the suspend to memory command.
I can neither figure out how to read the battery/powerline status information, nor get the audio chip to play sound, etc ... although I have been studying the kernel modules for a while...
From the above discussions, the battery and powerline status will probably be found through another device. However, the pmic_reg file may actually give the status if things are connected properly on the board.
The audio chip will use ALSA. You need to use either alsamixer or the command line amixer to set up audio routes through the codec, so the DAI channel (PCM from iMx32) is routed and sent to the speaker. To minimize power consumption, things are usually turned off by default. The /dev/dsp files are just OSS compatibility. This configuration will support ALSA natively. You are better off to use ALSA if possible.
Donate to the OSF and get a tax receipt, if this was helpful enough.

audio codec kernel driver using alsa - capture path vs playback path

I'm using a custom board running imx6q processor, and a tlv320aic3x audio codec.
Everything works ok after some bring-up, but I'm trying to improve the audio driver: whether I'm doing playback or capture - both playback and capture related amplifiers are switched on.
This causes side effects like noise in speakers when I'm capturing audio, and wastes power.
To solve this, I'm trying to define the data paths correctly in the driver, but I keep failing.
I find it hard to find resources on-line explaining how to code an ALSA driver using the ALSA predefined macros that exists in the Kernel.
I've searched http://www.alsa-project.org/, linux docs, and few other sources...
And to my questions:
Is there any decent tutorial out there? I'm specifically interested in DAPM and usage of control names.
Is it possible to "re-program" all driver data paths from userspace?
Is DAPM sufficient for decent power management? Or should I use userspace to switch on/off power from unused paths in the codec between playbacks and captures?
Just to be clear: in user space using the standard driver, I am able to do playback, capture and control mixers, switches, etc... However I'm trying to achieve better automatic power management.
Thanks

Fast Audio Input/Output

Here's what I want to do:
I want to allow the user to give my program some sound data (through a mic input), then hold it for 250ms, then output it back out through the speakers.
I have done this already using Java Sound API. The problem is that it's sorta slow. It takes a minimum of about 1-2 seconds from the time the sound is made to the time the sound is heard again from the speakers, and I haven't even tried to implement delay logic yet. Theoretically there should be no delay, but there is. I understand that you have to wait for the sound card to fill up its buffer or whatever, and the sample size and sampling rate have something to do with this.
My question is this: Should I continue down the Java path trying to do this? I want to get the delay down to like 100ms if possible. Does anyone have experience using the ASIO driver with Java? Supposedly it's faster..
Also, I'm a .NET guy. Does this make sense to do with .NET instead? What about C++? I'm looking for the right technology to use here, and maybe a good example of how to read/write to audio input/output streams using your suggested technology platform. Thanks for your help!
I've used JavaSound in the past and found it wonderfully flaky (and it keeps changing between VM releases). If you like C#, use it, just use the DirectX APIs. Here's an example of doing kind of what you want to do using DirectSound and C#. You could use the Effects plugins to perform your 250 ms echo.
http://blogs.microsoft.co.il/blogs/tamir/archive/2008/12/25/capturing-and-streaming-sound-by-using-directsound-with-c.aspx
You may want to look into JACK, an audio API designed for low-latency sound processing. Additionally, Google turns up this nifty presentation [PDF] about using JACK with Java.
Theoretically there should be no delay, but there is.
Well, it's impossible to have zero delay. The best you can hope for is an unnoticeable delay (in terms of human perception). It might help if you describe your basic algorithm for reading & writing the sound data, so people can identify possible problems.
A potential issue with using a garbage-collected language like Java is that the GC will periodically run, interrupting your processing for some arbitrary amount of time. However, I'd be surprised if it's >100ms in normal usage. If GC is a problem, most JVMs provide alternate collection algorithms you can try.
If you choose to go down the C/C++ path, I highly recommend using PortAudio ( http://portaudio.com/ ). It works with almost everything on multiple platforms and it gives you low-level control of the sound drivers without actually having to deal with the various sound driver technology that is around.
I've used PortAudio on multiple projects, and it is a real joy to use. And the license is permissive.
If low latency is your goal, you can't beat C.
libsoundio is a low-level C library for real-time audio input and output. It even comes with an example program that does exactly what you want - piping the microphone input to the speakers output.
It's possible with JavaSound to get end-to-end latency in the ballpark of 100-150ms.
The primary cause of latency is the buffer sizes of the capture and playback lines. The bufferSize is set when opening the lines:
capture: TargetDataLine#open(AudioFormat format, int bufferSize)
playback: SourceDataLine#open(AudioFormat format, int bufferSize)
If the buffer is too big it will cause excess latency, but if it's too small it will cause stuttery playback. So you need to find a balance for your applications needs and your computing power.
The default buffer size can be checked with DataLine#getBufferSize when calling #open(AudioFormat format). The default size will vary based on the AudioFormat and seems to be geared for high latency, stutter free playback applications (e.g. internet streaming). If you're developing a low latency application, the default buffer size is much too large and should be changed.
In my testing with a 16-bit PCM AudioFormat, a buffer size of 1024 bytes has been pretty close to ideal for low latency.
The second and often overlooked cause of audio latency is any other activity being done in the capture or playback threads. For example, logging messages to console can introduce 10's of ms of latency. Turn it off.

Resources