pcm capture using alsa - audio

I'm new in alsa sound programming. I'm developing an application to record the audio in to a wav file in c language. I did some research on net but still not very clear about many topics. Please help.
This is the configuration I'm setting.
format: S16_LE
rate: 16000
channel: 1
I have few doubts:
I'm highly confused between the period size and period time settings.
What is the difference between snd_pcm_hw_params_set_period_time_near() and snd_pcm_hw_params_set_period_size_near(). Which API should be called for capture? Similarly there is snd_pcm_hw_params_set_buffer_time_near() and snd_pcm_hw_params_set_buffer_size_near(). How to decide between these two APIs?
How to decide the period size value? I believe the same value is used in snd_pcm_sw_params_set_avail_min() call.
What value should be used for number of frames to be read in snd_pcm_readi()?
What is the importance of snd_pcm_sw_params_set_avail_min() and snd_pcm_start_threshold() APIs? Is it a must to call those
I'm referring the arecord implementation and another example code for capture.
Thanks in advance.

The period time describes the same parameter as the period size. It might be useful if the rate is not yet known.
You get interrupts (i.e., the opportunity to get woken up if you're waiting for data) at the end of each period. If you know how much data you want to read each time, try to use that as period size.
Read as many frames as you want to process.
The avail_min parameter specifies how many frames must be available before an interrupt results in you application actually being waken up.
The start threshold specifies that the device starts automatically when you try to read that many frames.


What do the ALSA timestamping function return and how do the result relate to each other?

There are several "hi-res" timestamping functions in ALSA:
I would like to understand what points in time the resulting functions represent.
My current understanding is that trigger_htstamp represents the time when stream was started/stopped/paused. snd_pcm_status_get_trigger_htstamp returns a constant value and when I add audio_htstamp to that value the result is very close to the current system time.
audio_htstamp seems to start from zero on my system and it is incremented by a value that is equal to the period size I use. Hence on my system it is a simple frame counter. If I understand ALSA correctly audio_htstamp can also work in different more accurate way depending on the system capabilities.
driver_htstamp I guess by the name is a timestamp generated by the audio driver.
Question 1: When is the timestamp driver_htstamp usually generated?
With htstamp I am really unsure where and when it is generated. I have a hunch that it may be related to DMA.
Question 2: Where is htstamp generated?
Question 3: When is htstamp generated?
Question 4: Is the assumption audio_htstamp < htstamp < driver_htstamp generally correct?
It seems like this with a little test program I wrote, but I want to verify my assumption.
I can not find this information in the ALSA documentation.
I just dug through the code for this stuff for my own purposes, so I figured I would share what I found.
The purpose of these timestamps is to allow you to determine subtle differences in the rate of different clocks; most importantly in this case the main system clock that Linux uses for general timekeeping compared with the different clock that determines the rate at which samples move in and out of the sound device. This can be very important for applications that need to keep audio from different hardware devices in sync, since the rates of different physical clocks are never exactly the same.
The technique used is sometimes called "cross-timestamping"; you capture timestamps from the clocks you want to compare as close to simultaneously as possible, and repeat this at regular intervals. There is usually some measurement error introduced, but some relatively simple filtering can get you a good characterization of the difference in the rate at which the clocks count.
The core PCM driver arranges to take a system clock timestamp as closely as possible to when an audio stream starts, and then it does a cross-timestamp between the system clock and audio clock (which can be measured in different ways) whenever it is asked to check the state of the hardware pointers for the DMA engine that moves samples around.
The default method of measuring the audio clock is via DMA hardware pointer comparsion. This isn't terribly precise, but over longer periods of time you can still get a good measure of the rate difference. At the start of snd_pcm_update_hw_ptr0, a system timestamp is captured; this will end up being htstamp. The DMA pointers are then checked, and if it's determined that they've moved since the last check, audio_htstamp is calculated based on the number of frames DMA has copied and the nominal frequency of the audio clock. Then, once all the DMA pointer update is done and right before snd_pcm_update_hw_ptr0 returns, another system timestamp is captured in driver_htstamp. This isn't meant to be used when you're using the DMA hw_ptr method of calculating the audio_htstamp though.
If you happen to have an audio device using the HDAudio driver, you can use an alternate and much more precise method of measuring the audio clock. It supplies an extra operation callback called get_time_info that is used instead of the default method of capturing the system and audio timestamps. It the HDAudio case, it takes a system timestamp for htstamp as close to possible to when it reads an interal counter driven by the same clock source as the audio clock; this forms the audio_htstamp. Afterwards, the same DMA hw_ptr bookkeeping is done, but the code that translates the pointer movement into time is skipped. The driver_htstamp is still taken right before the routine ends, though; this is "to let apps detect if the reference tstamp read by low-level hardware was provided with a delay" as the comment says in the code. This is because there's no guarantee that the get_time_info callback is going to take a new system timestamp; it may have previously recorded an audio timestamp along with a system timestamp as part of an interrupt handler. In this case, the timestamps you get might not match with the available frames and delay frames counts calculated by hw_ptr bookkeeping, but the driver_htstamp will let you know the closest system time to when those calculations were made.
In any case, the code is designed in both cases to capture htstamp and audio_htstamp as closely together as possible, and for htstamp - trigger_htstamp to represent the amount of system time that passed during the period measured by audio_htstamp of the audio clock. You mostly shouldn't need to use driver_htstamp, but I guess it might be used with the USB Audio driver, as I think it and HDAudio are the only ones that do anything special with these interfaces right now.
The documentation for this, although it doesn't contain all the details you might want to know, is part of the kernel documentation: http://lxr.free-electrons.com/source/Documentation/sound/alsa/timestamping.txt?v=4.9

gnuradio phase drift of AM demodulation

I am beginning a project using GNUradio and an inexpensive SDR.
One portion of the project requires me to generate a reference audio tone and compare the phase of that tone to demodulated audio.
To simulate this portion of the system, I have generated a simple GNUradio flowchart:
I had some issues with the source and demodulated audio in that they would drift relative to each other. This occurred on the scope sync on the original flowgraph. To aid in troubleshooting I sent the demodulated audio out thru the soundcard’s second channel and monitored both audio streams in addition to the modulated RF on an external oscilloscope:
Initially all seems well but, the demodulated audio drifts in relation to the original source and RF:
My question is: am I doing something wrong in the flowgraph or am I expecting too much performance out of an inexpensive SDR?
Thanks in advance for any insights
You cannot expect to see zero phase drift in anything short of a fully digital simulation, or a fully analog circuit with exactly one oscillator, because no two (physical) oscillators have identical frequencies.
In your case, there are two relevant oscillators involved:
The sample clock in the RTL-SDR unit.
The sample clock in your sound card output.
Within an GNU Radio flowgraph, there is no time reference per se and everything depends on the sources and sinks which are connected to hardware.
The relevant source in your flowgraph is the RTL-SDR hardware; insofar as its oscillator is different from its nominal value (28.8 MHz, as it happens), everything it produces will be off-frequency in an absolute sense (both RF carrier frequencies and audio frequencies of demodulated output).
But you don't actually have an absolute frequency reference; you have the tone produced by your sound card. The sound card has its own oscillator, which determines the rate at which samples are converted to analog signals, and therefore the rate at which samples are consumed from the flowgraph.
Therefore, your reference signal will drift relative to your received and demodulated signal, at a rate determined by the difference in frequency error between the two oscillators.
Additionally, since your sound card will be accepting samples from the flowgraph at a slightly different real-time rate than the RTL-SDR is producing them, you will notice periodic glitches in the audio as the error accumulates and must be dealt with; they will start occurring either immediately (if the source is slower than the sink, requiring the sound card to play silence instead) or after a delay for buffers to hit their maximum size (if the source is faster than the sink, requiring the RTL-SDR to drop some samples).

How to get amplitude of an audio stream in an AudioGraph to build a SoundWave using Universal Windows?

I want to built a SoundWave sampling an audio stream.
I read that a good method is to get amplitude of the audio stream and represent it with a Polygon. But, suppose we have and AudioGraph with just a DeviceInputNode and a FileOutpuNode (a simple recorder).
How can I get the amplitude from a node of the AudioGraph?
What is the best way to periodize this sampling? Is a DispatcherTimer good enough?
Any help will be appreciated.
First, everything you care about is kind of here:
uwp AudioGraph audio processing
But since you have a different starting point, I'll explain some more core things.
An AudioGraph node is already periodized for you -- it's generally how audio works. I think Win10 defaults to periods of 10ms and/or 20ms, but this can be set (theoretically) via the AudioGraphSettings.DesiredSamplesPerQuantum setting, with the AudioGraphSettings.QuantumSizeSelectionMode = QuantumSizeSelectionMode.ClosestToDesired; I believe the success of this functionality actually depends on your audio hardware and not the OS specifically. My PC can only do 480 and 960. This number is how many samples of the audio signal to accumulate per channel (mono is one channel, stereo is two channels, etc...), and this number will also set the callback timing as a by-product.
Win10 and most devices default to 48000Hz sample rate, which means they are measuring/output data that many times per second. So with my QuantumSize of 480 for every frame of audio, i am getting 48000/480 or 100 frames every second, which means i'm getting them every 10 milliseconds by default. If you set your quantum to 960 samples per frame, you would get 50 frames every second, or a frame every 20ms.
To get a callback into that frame of audio every quantum, you need to register an event into the AudioGraph.QuantumProcessed handler. You can directly reference the link above for how to do that.
So by default, a frame of data is stored in an array of 480 floats from [-1,+1]. And to get the amplitude, you just average the absolute value of this data.
This part, including handling multiple channels of audio, is explained more thoroughly in my other post.
Have fun!

DirectShow, specifically Rate Matching, time stamps and the DirectSound Audio Renderer

Can anyone give me a concise explanation of how and why DirectShow DirectSound Audio Renderer will adjust the rate when I have my custom capture filter that does not expose a clock?
I cannot make any sense of it at all. When audio starts, I assign a rtStart of zero plus the duration of the sample (numbytes / m_wfx.nAvgBytesPerSec). Then the next sample has a start time of the end of the previous sample, and so on....
Some time later, the capture filter senses Directshow is consuming samples too rapidly, and tries to set a timestamp of some time in the future, which the audio renderer completely ignores. I can, as a test, suddenly tell a sample it must not be rendered until 20 secs in the future (StreamTime() + UNITS), and again the renderer just ignores it. However, the Null Audio Renderer does what it is told, and the whole graph freezes for 20 seconds, which is the expected behaviour.
In a nutshell, then, I want the audio renderer to use either my capture clock (or its own, or the graph's, I dont care) but I do need it to obey the time stamps I'm sending to it. What I need it to do is squish or stretch samples, ever so subtly, to make up for the difference in the rates between DSound and the oncoming stream (whose rate I cannot control).
MSDN explains the technology here: Live Sources, I suppose you are aware of this documentation topic.
Rate matching takes place when your source is live, otherwise audio renderer does not need to bother and it expects the source to keep input queue pre-loaded with data, so that data is consumed at the rate it is needed.
It seems that your filter is capturing in real time (capture filter and then you mention you don't control the rate of data you obtain externally). So you need to make sure your capture filter is recognized as live source and then you choose the clock for playback, and overall the mode of operation. I suppose you want the behavior described hear AM_PUSHSOURCECAPS_PRIVATE_CLOCK:
the source filter is using a private clock to generate time stamps. In this case, the audio renderer matches rates against the time stamps.
This is what you write about above:
you time stamp according to external source
playback is using audio device clock
audio renderer does rate matching to match the rates
To see how exactly rate matching takes place, you need to open audio renderer property pages, Advanced page:
Data under Slaving Info will show the rate matching details (48000/48300 matching in my example). The data is also available programmatically via IAMAudioRendererStats::GetStatParam.

Capturing sound on Linux with low latency

I want to capture audio on Linux with low latency in a program I'm writing.
I've run some experiments using the ALSA API, using snd_pcm_readi() to
capture sound, then immediately using snd_pcm_writei() to play it back.
I've tried playing with the number of frames captured, and the buffer size,
but I don't seem to be able to get the latency down to less than a second
or so.
Am I better off using PulseAudio or JACK? Can those be used to play the
captured audio?
To reduce capture latency, reduce the period size of the capture device.
To reduce playback latency, reduce the buffer size of the playback device.
Jack can play the captured audio (just connect the input ports to the output ports), but you still have to configure its periods/buffers.
Also see Relation between period size of speaker and mic and Recording from ALSA - understanding memory mapping.
I've doing some work on low latency audio programming,
My experience is, first, your capture buffer should be small, like 10ms period buffer. (let's assuming you're using 512 frame buffer, and 48000 sample rate).
Then, you should config your Output device start_threshold to at least 2 * frame size ( 1 * frame size if your don't have much process of recorded data).
For record device, like CL. said, use a relative small period size is better, but not too small to avoid too much irq.
Also, you can change your process schedule to FIFO schedule.
Then, hopefully, you will get about 20ms total latency.
I believe you should at first ensure that you are running a Linux kernel which actually allows you to achieve low typical latency.
There are several kernel compile-time configuration options which you might look into:
CONFIG_PREEMPT_RT_FULL (available only with RT patch)
Apart from that, there are more things you can do in order to optimize your audio latency in Linux. Some starting reference points can be found there:
