What do the ALSA timestamping function return and how do the result relate to each other? - linux

There are several "hi-res" timestamping functions in ALSA:
snd_pcm_status_get_trigger_htstamp
snd_pcm_status_get_audio_htstamp
snd_pcm_status_get_driver_htstamp
snd_pcm_status_get_htstamp
I would like to understand what points in time the resulting functions represent.
My current understanding is that trigger_htstamp represents the time when stream was started/stopped/paused. snd_pcm_status_get_trigger_htstamp returns a constant value and when I add audio_htstamp to that value the result is very close to the current system time.
audio_htstamp seems to start from zero on my system and it is incremented by a value that is equal to the period size I use. Hence on my system it is a simple frame counter. If I understand ALSA correctly audio_htstamp can also work in different more accurate way depending on the system capabilities.
driver_htstamp I guess by the name is a timestamp generated by the audio driver.
Question 1: When is the timestamp driver_htstamp usually generated?
With htstamp I am really unsure where and when it is generated. I have a hunch that it may be related to DMA.
Question 2: Where is htstamp generated?
Question 3: When is htstamp generated?
Question 4: Is the assumption audio_htstamp < htstamp < driver_htstamp generally correct?
It seems like this with a little test program I wrote, but I want to verify my assumption.
I can not find this information in the ALSA documentation.

I just dug through the code for this stuff for my own purposes, so I figured I would share what I found.
The purpose of these timestamps is to allow you to determine subtle differences in the rate of different clocks; most importantly in this case the main system clock that Linux uses for general timekeeping compared with the different clock that determines the rate at which samples move in and out of the sound device. This can be very important for applications that need to keep audio from different hardware devices in sync, since the rates of different physical clocks are never exactly the same.
The technique used is sometimes called "cross-timestamping"; you capture timestamps from the clocks you want to compare as close to simultaneously as possible, and repeat this at regular intervals. There is usually some measurement error introduced, but some relatively simple filtering can get you a good characterization of the difference in the rate at which the clocks count.
The core PCM driver arranges to take a system clock timestamp as closely as possible to when an audio stream starts, and then it does a cross-timestamp between the system clock and audio clock (which can be measured in different ways) whenever it is asked to check the state of the hardware pointers for the DMA engine that moves samples around.
The default method of measuring the audio clock is via DMA hardware pointer comparsion. This isn't terribly precise, but over longer periods of time you can still get a good measure of the rate difference. At the start of snd_pcm_update_hw_ptr0, a system timestamp is captured; this will end up being htstamp. The DMA pointers are then checked, and if it's determined that they've moved since the last check, audio_htstamp is calculated based on the number of frames DMA has copied and the nominal frequency of the audio clock. Then, once all the DMA pointer update is done and right before snd_pcm_update_hw_ptr0 returns, another system timestamp is captured in driver_htstamp. This isn't meant to be used when you're using the DMA hw_ptr method of calculating the audio_htstamp though.
If you happen to have an audio device using the HDAudio driver, you can use an alternate and much more precise method of measuring the audio clock. It supplies an extra operation callback called get_time_info that is used instead of the default method of capturing the system and audio timestamps. It the HDAudio case, it takes a system timestamp for htstamp as close to possible to when it reads an interal counter driven by the same clock source as the audio clock; this forms the audio_htstamp. Afterwards, the same DMA hw_ptr bookkeeping is done, but the code that translates the pointer movement into time is skipped. The driver_htstamp is still taken right before the routine ends, though; this is "to let apps detect if the reference tstamp read by low-level hardware was provided with a delay" as the comment says in the code. This is because there's no guarantee that the get_time_info callback is going to take a new system timestamp; it may have previously recorded an audio timestamp along with a system timestamp as part of an interrupt handler. In this case, the timestamps you get might not match with the available frames and delay frames counts calculated by hw_ptr bookkeeping, but the driver_htstamp will let you know the closest system time to when those calculations were made.
In any case, the code is designed in both cases to capture htstamp and audio_htstamp as closely together as possible, and for htstamp - trigger_htstamp to represent the amount of system time that passed during the period measured by audio_htstamp of the audio clock. You mostly shouldn't need to use driver_htstamp, but I guess it might be used with the USB Audio driver, as I think it and HDAudio are the only ones that do anything special with these interfaces right now.
The documentation for this, although it doesn't contain all the details you might want to know, is part of the kernel documentation: http://lxr.free-electrons.com/source/Documentation/sound/alsa/timestamping.txt?v=4.9

Related

Synchronization of WASAPI Audio Devices

Is there a way with WASAPI to determine if two devices (an input and an output device) are both synced to the same underlying clock source?
In all the examples I've seen input and output devices are handled separately - typically a different thread or event handle is used for each and I've not seen any discussion about how to keep two devices in sync (or how to handle the devices going out of sync).
For my app I basically need to do real-time input to output processing where each audio cycle I get a certain number of incoming samples and I send the same number of output samples. ie: I need one triggering event for the audio cycle that will be correct for both devices - not separate events for each device.
I also need to understand how this works in both exclusive and shared modes. For exclusive I guess this will come down to finding if devices have a common clock source. For shared mode some information on what Windows guarantees about synchronization of devices would be great.
You can use the IAudioClock API to detect drift of a given audio client, relative to QPC; if two endpoints share a clock, their drift relative to QPC will be identical (that is, they will have zero drift relative to each other.)
You can use the IAudioClockAdjustment API to adjust for drift that you can detect. For example, you could correct both sides for drift relative to QPC; you could correct either side for drift relative to the other; or you could split the difference and correct both sides to the mean.

What exactly does ALSA's snd_pcm_delay() return?

I want to use snd_pcm_delay() to query the delay until the sample I am about write to the ALSA buffer are hearable. I expect this value to vary between individual calls. Though, on two system this value is constant. The function returns a value that is always equal to the period size on one platform and on the other platform it is equal to the buffer size (two times the period size in my code).
Is my understanding of snd_pcm_delay() wrong? Is it a driver problem?
The delay is proportional to the number of samples in the buffer (the inverse of snd_pcm_avail()), plus a time that describes how much time is needed to move samples from the buffer to the speakers. The latter part is driver dependent and might not be implemented.
If the device takes out samples one entire period at a time (some DMA controllers have no better granularity for reporting the current position), then the delay value will appear to stay constant for a time, and then jump by an entire period. And you see that jump only before you have re-filled the buffer.

pcm capture using alsa

I'm new in alsa sound programming. I'm developing an application to record the audio in to a wav file in c language. I did some research on net but still not very clear about many topics. Please help.
This is the configuration I'm setting.
access : SND_PCM_ACCESS_RW_INTERLEAVED
format: S16_LE
rate: 16000
channel: 1
I have few doubts:
I'm highly confused between the period size and period time settings.
What is the difference between snd_pcm_hw_params_set_period_time_near() and snd_pcm_hw_params_set_period_size_near(). Which API should be called for capture? Similarly there is snd_pcm_hw_params_set_buffer_time_near() and snd_pcm_hw_params_set_buffer_size_near(). How to decide between these two APIs?
How to decide the period size value? I believe the same value is used in snd_pcm_sw_params_set_avail_min() call.
What value should be used for number of frames to be read in snd_pcm_readi()?
What is the importance of snd_pcm_sw_params_set_avail_min() and snd_pcm_start_threshold() APIs? Is it a must to call those
I'm referring the arecord implementation and another example code for capture.
Thanks in advance.
The period time describes the same parameter as the period size. It might be useful if the rate is not yet known.
You get interrupts (i.e., the opportunity to get woken up if you're waiting for data) at the end of each period. If you know how much data you want to read each time, try to use that as period size.
Read as many frames as you want to process.
The avail_min parameter specifies how many frames must be available before an interrupt results in you application actually being waken up.
The start threshold specifies that the device starts automatically when you try to read that many frames.

gnuradio phase drift of AM demodulation

I am beginning a project using GNUradio and an inexpensive SDR.
http://www.amazon.com/gp/product/B00SXZDUAQ?psc=1&redirect=true&ref_=oh_aui_search_detailpage
One portion of the project requires me to generate a reference audio tone and compare the phase of that tone to demodulated audio.
To simulate this portion of the system, I have generated a simple GNUradio flowchart:
I had some issues with the source and demodulated audio in that they would drift relative to each other. This occurred on the scope sync on the original flowgraph. To aid in troubleshooting I sent the demodulated audio out thru the soundcard’s second channel and monitored both audio streams in addition to the modulated RF on an external oscilloscope:
Initially all seems well but, the demodulated audio drifts in relation to the original source and RF:
My question is: am I doing something wrong in the flowgraph or am I expecting too much performance out of an inexpensive SDR?
Thanks in advance for any insights
You cannot expect to see zero phase drift in anything short of a fully digital simulation, or a fully analog circuit with exactly one oscillator, because no two (physical) oscillators have identical frequencies.
In your case, there are two relevant oscillators involved:
The sample clock in the RTL-SDR unit.
The sample clock in your sound card output.
Within an GNU Radio flowgraph, there is no time reference per se and everything depends on the sources and sinks which are connected to hardware.
The relevant source in your flowgraph is the RTL-SDR hardware; insofar as its oscillator is different from its nominal value (28.8 MHz, as it happens), everything it produces will be off-frequency in an absolute sense (both RF carrier frequencies and audio frequencies of demodulated output).
But you don't actually have an absolute frequency reference; you have the tone produced by your sound card. The sound card has its own oscillator, which determines the rate at which samples are converted to analog signals, and therefore the rate at which samples are consumed from the flowgraph.
Therefore, your reference signal will drift relative to your received and demodulated signal, at a rate determined by the difference in frequency error between the two oscillators.
Additionally, since your sound card will be accepting samples from the flowgraph at a slightly different real-time rate than the RTL-SDR is producing them, you will notice periodic glitches in the audio as the error accumulates and must be dealt with; they will start occurring either immediately (if the source is slower than the sink, requiring the sound card to play silence instead) or after a delay for buffers to hit their maximum size (if the source is faster than the sink, requiring the RTL-SDR to drop some samples).

Sync two soundcards

I have a program written in C++ that uses RtAudio ( Directsound ) to capture and playback audio at 48kHz samplerate.
The input capture uses a callback option. The callback writes data to a ringbuffer.
The output is a blocking write function in a separate thread that reads from the ringbuffer.
If the input and output devices are the same the audio loops thru perfectly.
Now I want to get audio from device 1 and playback on device 2. Each device has its own sampleclock set to 48kHz but are not in sync. After a couple of seconds the input and output are out of sync.
Is it possible to sync two independent oudio devices?
There are two challenges you face:
getting the two devices to start at the same time.
getting the two devices to stay in sync.
Both of these tasks are difficult. In the pro audio world, #2 is accomplished with special hardware to sync the word-clocks of multiple devices. It can also be done with a high quality video signal. I believe it can also be done with firewire devices, but I'm not sure how that works. In practice, I have used devices with no sync ("wild") and gotten very reasonable sync for up to an hour or two. Depending on what you are trying to do, the sync should not drift more than a few milliseconds over the course of a few minutes. If it does, you can consider your hardware broken (of course, cheap hardware is often broken).
As for #1, I'm not sure this is possible in any reliable sense with directsound. To the extent that it's possible with any audio API, it is difficult at best: both cards have streams that require some time to setup, open and start playing. In general, the solution is to use an API where this time is super low (ASIO, for example). This works reasonably well for applications like video, but I don't know if it really solves the problem in general.
If you really need to solve this problem, you could open both cards, starting to play silence, and use the timing information generated by the cards to establish the delay between putting data into the card and its eventual playback (this will be different for each card and probably each time you run) and use that data to calculate when to start actual playback. I don't know if RTAudio supplies the necessary timing information, but PortAudio does. This document may help.

Resources