Low level audio programming - audio

I wonder; does audio software like Cubase and Audacity use PlaySound calls??
Where can I learn about low level audio programming? As far as I've found information on the web, MCI seems to be the lowest level audio API in Windows...
Thanks
Edit: I don't ask for information specific for Windows only.

There's several audio APIs to choose from. The oldest and most widely supported is the waveOut API - look for functions starting with waveOut in MSDN. A slightly newer one is DirectSound which is geared more towards games, but it's main feature over waveOut is positional 3D sound which professional audio software doesn't use (it was also supposed to have lower latency than waveOut, but that never really materialized). For low latency audio, there is ASIO. Professional audio apps support this API, but not all drivers do (it's a standard feature in professional sound cards, but not gaming or on-board hardware). ASIO can provide much lower latency than waveOut or DirectSound. Finally, there's the kernel streaming interface, which is the lowest-level audio interface still accessible from user-mode code. This is a direct pipe into Windows's internal mixer which combines output from all apps that are currently playing sound into the signal that gets sent to the sound card. It's scarcely documented though. There's a driver called ASIO4ALL (just google it) that provides ASIO support on soundcards without ASIO drivers by implementing the ASIO API on top of the kernel streaming interface.

I'm a little late to the game here, but I posted a Windows API history last week that might add a little more context. The choice of API really depends on your needs. If you want to avoid 3rd party libraries, it really only comes down to MME, XAudio2, and Core Audio (WASAPI).
A Brief History of Windows Audio APIs
Hope this helps!

Actually, if you are looking for more than Windows-only output support, then the best way to start is to review Phil Burk's PortAudio, available as of this writing at http://www.portaudio.com/ .
ASIO is a good quality interface, but it's proprietary and owned by Steinberg.
There are many lower-level interfaces to audio output than MCI in modern Windows. These include, at least, DirectSound, XAudio and WASAPI.
I recommend avoiding the Windows APIs as much as possible, and learning PortAudio instead.

Related

Is Opus supported for VoLTE?

There are so many different codecs for phone calls and many of them have very high license fees, meaning it will take a lot of time before everyone can use normal telephony with wide band audio.
Is Opus supported for VoLTE?
The usual codecs for VoLTE are AMR, AMR-WB and EVS (see links below for more info - thanks, #Mikael DĂși Bolinder).
As with most mainstream voice (and video codecs) there is IPR and licensing associate with these. However, for end users the network providers and device manufacturers have included the licensing and the codecs in their rollouts so a typical operator service will use these.
I'm not aware of any restrictions from 3GPP on using other codecs if the devices and the network support them, but the above are definitely the default and the most widely used.
If you want to create your own voice service, e.g a VoIP service running over the data connection to the phone, then in theory you can use whatever codec you want. It's worth being aware that for software based codecs, which they will be unless they are tightly integrated in the device's hardware, the efficiency is important as an inefficient implementation may impact performance, battery life etc.
For Opus in particular there are several open source projects which provide Android libraries for this, for example. Opus is also supposed to be supported on devices from Android 5+ (https://developer.android.com/guide/topics/media/media-formats).
amr-licensing-wikipedia: https://en.wikipedia.org/wiki/Adaptive_Multi-Rate_audio_codec#Licensing_and_patent_issues "AMR licensing (and issues) on Wikipedia"
amr-wb-licensing-wikipedia: https://en.wikipedia.org/wiki/Adaptive_Multi-Rate_Wideband#Licensing "AMR-WB licensing on Wikipedia"
evs-news-patent-pool: http://www.mpegla.com/Lists/MPEG%20LA%20News%20List/Attachments/97/n-16-01-20.pdf "MPEG developing a patent pool for EVS"

manipulating audio input buffer on Ubuntu Linux

Suppose that I want to code an audio filter in C++ that is applied on every audio or to a specific microphone/source, where should I start with this on ubuntu ?
edit, to be clear I don't get how to do this and what is the role of Pulseaudio, ALSA and Gstreamer.
Alsa provides an API for accessing and controlling audio and MIDI hardware. One portion of ALSA is a series of kernel-mode device drivers, whilst the other is a user-space library that applications link against. Alsa is single-client.
PulseAudio is framework that facilitates multiple client applications accessing a single audio interface (alsa is single-client). It provides a daemon process which 'owns' the audio interface and provides a IPC transport for audio between the daemon and applications using it. This is used heavily in open source desktop environments. Use of Pulse is largely transparent to applications - they continue to access the audio input and output using the alsa API with audio transport and mixing. There is also Jack which is targeted more towards 'professional' audio applications - perhaps a bit of a misnomer, although what is meant here is low latency music production tools.
gStreamer is a general purpose multi-media framework based on the signal-graph pattern, in which components have a number of inputs and output pins and provide a transformation function. A Graph of these components is build to implement operations such as media decoding, with special nodes for audio and video input or output. It is similar in concept to CoreAudio and DirectShow. VLC and libAV are both open source alternatives that operate along similar lines. Your choice between these is a matter of API style, and implementation language. gStreamer, in particular, is an OO API implemented in C. VLC is C++.
The obvious way of implementing the problem you describe is to implement a gStreamer/libAV/VLC component. If you want to process the audio and then route it to another application, this can be achieved by looping it back through Pulse or Jack.
Alsa provides a plug-in mechanism, but I suspect that implementing this from the ALSA documentation will be tough going.
The de-facto architecture for building effects plug-ins of the type you describe is Steinberg's VST. There are plenty of open source hosts and examples of plug-ins that can be used on Linux, and crucially, there is decent documentation. As with a gStreamer/libAV/VLC, you be able to route audio in an out of this.
Out of these, VST is probably the easiest to pick up.

Does DirectSound usually support echo cancellation and noise reduction?

I'm currently using the waveInOpen set of Windows API functions to record audio for a VOIP application. I'm now being asked to add echo cancellation, and possibly noise reduction, and gain control. I know nothing about DirectSound, but while searching on "echo cancellation" on Google I came across references on MSDN to DirectSound such as CaptureAcousticEchoCancellationEffect.
If I switch to DirectSound will I get some of these features "for free"? Are they only supported if the hardware supports it, and if so, how often will that hardware be present in the average consumer PC?
Starting with Windows Vista, Microsoft provides a separate component Voice Capture DSP:
The voice capture DMO includes the following DSP components:
Acoustic echo cancellation (AEC)
Microphone array processing
Noise suppression
Automatic gain control
Voice activity detection
Applications can turn each component on and off individually.
You can use it in your DSP application to leverage EAC and NS implemented in software.
As far as I know these features are not professionally supported in DirectSound. A hardware device that has support for these features usually is equipped with a special processor/DSP and costs a lot more than the standard hardware device.

Windows 8 - low latency audio

I'm considering developing an app for the upcoming Windows 8. The app requires low-latency audio recording and playback, and I'm trying find out whether the OS will support that (as opposed to other platforms).
So what I'd like to know is:
Is there a low-latency audio API in Windows 8?
Will it be supported on platforms other than PC (e.g. tablets)?
Thanks!
WASAPI was introduced with Windows Vista as the low-latency audio API. It is available both to desktop and to Metro style applications on Windows 8. Because it is a very low-level API, using it is not simple, but it gives you the most power. It will work on both Windows 8 and the newly-minted Windows RT (Arm).
Also available is XAudio2 which is a slightly higher-level API which will be easier to work with. It is the replacement for DirectSound and is designed for game developers, but may work for your purposes. This also is available to both Windows 8 and Windows RT.
There is a bit of comparison of the two APIs at the bottom of this article. I would start with XAudio2 and move to WASAPI only if you find XAudio2 doesn't meet your needs.
I would consider using XAudio2. Microsoft providers Basic audio playback sample for easy start
Yes, there is a low latency API you can access. It's called WASAPI
From my understanding, all tablets/Laptops/Desktops/anything running Windows 8, will have access to it. The only downside is that it's harder to work with (because it's lower level), but you get to directly interact with the byte arrays getting send to the speaker, and the latency is very low.

Best cross-platform audio library for synchronizing audio playback

I'm writing a cross-platform program that involves scrolling a waveform along with uncompressed wav/aiff audio playback. Low latency and accuracy are pretty important. What is the best cross-platform audio library for audio playback when synchronizing to an external clock? By that I mean that I would like to be able to write the playback code so it sends events to a listener many times per second that includes the "hearing frame" at the moment of the notification.
That's all I need to do. No recording, no mixing, no 3d audio, nothing. Just playback with the best possible hearing frame notifications available.
Right now I am considering RTAudio and PortAudio, mostly the former since it uses ALSA.
The target platforms, in order of importance, are Mac OSX 10.5/6, Ubuntu 10.11, Windows XP/7.
C/C++ are both fine.
Thanks for your help!
The best performing cross platform library for this is jack. Properly configured, jack on Linux can outperform Windows asio easily (in terms of low latency processing without dropouts). But you cannot expect normal users to use jack (the demon should be started by the user before the app is started, and it can be a bit tricky to set up). If you are making an app specifically for pro-audio I would highly recommend looking in to jack.
Edit:
Portaudio is not as high-performance, but is much simpler for the user (no special configuration should be needed on their end, unlike jack). Most open source cross platform audio programs that I have used use portaudio (much moreso than openal), but unlike jack I have not used it personally. It is callback based, and looks pretty straightforward though.
OpenAL maybe an option for you.

Resources