Wasapi Record and Playback, Same Audio Device, Audio Skewing - wasapi

I'm developing an application for Windows (7+) that uses Wasapi for simultaneous record and playback (VOIP style). I've set up two streams to the SAME device (one capture, one render), using exclusive mode access. Buffer sizes are exactly the same (10 ms worth of data, aligned properly).
Everything cranks along just great, but I've noticed that the rate at which data is being captured vs rendered is 'slightly' different, almost as if I were using two separate devices with different clocks. The capture stream supplies data at a slightly faster rate than the render stream consumes.
When my application is talking to another user, I'm wanting the user to hear themselves as part of the mix. This will be impossible without 'popping' occasionally if these two streams aren't perfectly synchronized.
Has anybody run into this 'out of sync same device' problem? Is there some basic concept I'm missing?

Related

Programmatic access to a sound played through OpenAL

I am working with an application that uses OpenAL API quite extensively. In particular, there are multiple sound sources, non-trivial listener filters, etc.
I want to be able to run this application significantly faster than real-time. At the same time, the sound must be saved for later postprocessing. Is there a way to access the OpenAL output programmatically (virtually) without ever playing the sound on the real playback device?
Ideally, I'd like to have access that would be played during every tick of the main loop of my application. Normally one tick corresponds to one rendered frame (e.g. 1/30th of a second). But in this case we would be running the app as fast as possible.
We ended up using OpenAL Soft to do this. Example:
#include "alext.h"
LPALCLOOPBACKOPENDEVICESOFT alcLoopbackOpenDeviceSOFT;
alcLoopbackOpenDeviceSOFT = alcGetProcAddress(NULL,"alcLoopbackOpenDeviceSOFT");
replace your default device with this device
ALCcontext *context = alcCreateContext(device, attrs);
Set the attrs as you would for your default device
Then in the main loop use:
LPALCRENDERSAMPLESSOFT alcRenderSamplesSOFT;
alcRenderSamplesSOFT = alcGetProcAddress(NULL, "alcRenderSamplesSOFT");
alcRenderSamplesSOFT(device, buffer, 1024);
Here the buffer will store 1024 samples. This code runs faster than real-time, therefore you can sample frames every tick
Are you able to do your required functions with the audio data prior to its being shipped to OpenAL? I've done a lot with javax.sound.sampled when it is untethered by the blocking write() method in SourceDataLine, especially when saving to file rather than playing back.
From what little I know about OpenAL, there is also a blocking process occurs when data is shipped, with a queue of arrays that are managed. I've been meaning to look into this further...
(Probably not being very helpful here. Apologies.)

Synchronization of WASAPI Audio Devices

Is there a way with WASAPI to determine if two devices (an input and an output device) are both synced to the same underlying clock source?
In all the examples I've seen input and output devices are handled separately - typically a different thread or event handle is used for each and I've not seen any discussion about how to keep two devices in sync (or how to handle the devices going out of sync).
For my app I basically need to do real-time input to output processing where each audio cycle I get a certain number of incoming samples and I send the same number of output samples. ie: I need one triggering event for the audio cycle that will be correct for both devices - not separate events for each device.
I also need to understand how this works in both exclusive and shared modes. For exclusive I guess this will come down to finding if devices have a common clock source. For shared mode some information on what Windows guarantees about synchronization of devices would be great.
You can use the IAudioClock API to detect drift of a given audio client, relative to QPC; if two endpoints share a clock, their drift relative to QPC will be identical (that is, they will have zero drift relative to each other.)
You can use the IAudioClockAdjustment API to adjust for drift that you can detect. For example, you could correct both sides for drift relative to QPC; you could correct either side for drift relative to the other; or you could split the difference and correct both sides to the mean.

Adding audio effects (reverb etc..) to a BackgroundAudioPlayer driven streaming audio app

I have a windows phone 8 app which plays audio streams from a remote location or local files using the BackgroundAudioPlayer. I now want to be able to add audio effects, for example, reverb or echo, etc...
Please could you advise me on how to do this? I haven't been able to find a way of hooking extra audio processing code into the pipeline of audio processing even through I've read much about WASAPI, XAudio2 and looked at many code examples.
Note that the app is written in C# but, from my previous experience with writing audio processing code, I know that I should be writing the audio code in native C++. Roughly speaking, I need to find a point at which there is an audio buffer containing raw PCM data which I can use as an input for my audio processing code which will then write either back to the same buffer or to another buffer which is read by the next stage of audio processing. There need to be ways of synchronizing what happens in my code with the rest of the phone's audio processing mechanisms and, of course, the process needs to be very fast so as not to cause audio glitches. Or something like that; I'm used to how VST works, not how such things might work in the Windows Phone world.
Looking forward to seeing what you suggest...
Kind regards,
Matt Daley
I need to find a point at which there is an audio buffer containing
raw PCM data
AFAIK there's no such point. This MSDN page hints that audio/video decoding is performed not by the OS, but by the Qualcomm chip itself.
You can use something like Mp3Sharp for decoding. This way the mp3 will be decoded on the CPU by your managed code, you can interfere / process however you like, then feed the PCM into the media stream source. Main downside - battery life: the hardware-provided codecs should be much more power-efficient.

Sync two soundcards

I have a program written in C++ that uses RtAudio ( Directsound ) to capture and playback audio at 48kHz samplerate.
The input capture uses a callback option. The callback writes data to a ringbuffer.
The output is a blocking write function in a separate thread that reads from the ringbuffer.
If the input and output devices are the same the audio loops thru perfectly.
Now I want to get audio from device 1 and playback on device 2. Each device has its own sampleclock set to 48kHz but are not in sync. After a couple of seconds the input and output are out of sync.
Is it possible to sync two independent oudio devices?
There are two challenges you face:
getting the two devices to start at the same time.
getting the two devices to stay in sync.
Both of these tasks are difficult. In the pro audio world, #2 is accomplished with special hardware to sync the word-clocks of multiple devices. It can also be done with a high quality video signal. I believe it can also be done with firewire devices, but I'm not sure how that works. In practice, I have used devices with no sync ("wild") and gotten very reasonable sync for up to an hour or two. Depending on what you are trying to do, the sync should not drift more than a few milliseconds over the course of a few minutes. If it does, you can consider your hardware broken (of course, cheap hardware is often broken).
As for #1, I'm not sure this is possible in any reliable sense with directsound. To the extent that it's possible with any audio API, it is difficult at best: both cards have streams that require some time to setup, open and start playing. In general, the solution is to use an API where this time is super low (ASIO, for example). This works reasonably well for applications like video, but I don't know if it really solves the problem in general.
If you really need to solve this problem, you could open both cards, starting to play silence, and use the timing information generated by the cards to establish the delay between putting data into the card and its eventual playback (this will be different for each card and probably each time you run) and use that data to calculate when to start actual playback. I don't know if RTAudio supplies the necessary timing information, but PortAudio does. This document may help.

low latency sounds on key presses

I am trying to write an application(I'm a gui first timer) for my son, he has autism. There is a video player in the top half and a text entry area in the bottom. When letters are typed sounds are produced to mimic the words in the video.
There have been other posts on this site in regard to playing sounds on key presses, using gstreamer as a system call. I have also tried libcanberra but both seem to have significant delays between sounds. I can write the app in python or C but will likely do at least some of it in C.
I also want to mention that the video portion is being played by gstreamer. I tried to create two instances of gstreamer, to avoid expensive system calls but the audio instance seemed to kill the app when called.
If anyone has any tips on creating faster responding sounds I would really appreciate it.
You can upload a raw audio sample directly to PulseAudio so there will be no decoding and (perhaps save) extra switches by using the following function from Canberra:
http://developer.gnome.org/libcanberra/unstable/libcanberra-canberra.html#ca-context-cache
The next ca_context_play() will use it.
However, the biggest problem you'll encounter with this scenario (with simultaneous video playback) is that the audio device might be configured with large latency with PulseAudio (up to 1/2s or more for normal playback). It may be reasonable to file a bug to libcanberra to support a LOW_LATENCY flag, as it currently doesn't attempt to minimize delay for sound events afaik. That would be great to have.
GStreamer pulsesink could probably get low latency too (it has some properties for that), but I am afraid it won't be as lightweight as libcanberra, and you won't be able to cache a sample for instance. Ideally, GStreamer could also learn to cache samples, or pre-fill PulseAudio...

Resources