Detecting silence from float audio values - audio

I'm reading the audio data of a file as float, and I get for example these value:
-4,151046E+34
-2,365558E+38
6,068741E+26
-4,141856E+34
-2,179363E+38
1,177772E-04
-1,035052E+34
-1,#QNAN
2,668123E-20
-1,0609E+37
-2,153349E+38
1,105884E-16
-4,25223E+37
-1,#QNAN
-3,718855E+22
-1,695596E+38
I would like to detect when silence starts and ends.
Do these values represent values that are directly related to volume or would a value of 0 represent the value at this point in this screenshot, and I need to look at rather many of these values to detect silence?

Silence is a notion tied to perception which has an attribute of time ... silence cannot happen for just an instant of time surrounded by loud audio since it will not be perceived as silence
Silence happens when the audio curve is at or does not vary much from the zero crossing point for some perceivable period of time ... you cannot have listenable audio followed by silence which lasts for just an instant of time followed by listenable audio ... that is not silence ... your eardrum or the membrane of a microphone in a silent room does not vibrate ... as the loudness of a room increases from silence those surfaces begin to wobble ... the plot you show can be thought of as visualizing this wobble ... on the plot the only silence happens during that flat line period of time at the beginning
To programmatically identify when silence occurs you need two parameters
some maximum height of audio curve below which you declare silence happens
some minimum length of time the audio curve stays below that max height
You can experiment with guessing these values ... now lets identify when silence happens
package main
import "fmt"
func main() {
// somehow your audio_buffer gets populated
flag_in_candidate_silence := false // current sample is quiet
flag_currently_in_declared_silence := false // current stretch of samples are in silence period
total_num_samples := len(audio_buffer) // identify how many samples
max_vol := 0.1 // max volume and still a silence candidate
min_num_samples := 2000 // minimum number of samples necessary to declare silence has happened
// value used is dependent on sampling rate
curr_num_samples_found := 0
index_silence_starts := 0
index_silence_ends := 0
for curr_sample := 0; curr_sample < total_num_samples; curr_sample++ {
curr_amplitude := audio_buffer[curr_sample]
if curr_amplitude < max_vol { // current sample is candidate for silence
index_silence_ends = curr_sample
if flag_in_candidate_silence != true { // previous sample was not a candidate
index_silence_starts = curr_sample
}
if curr_num_samples_found > min_num_samples {
// we are inside a period of silence !!!!!!!!!!!
flag_currently_in_declared_silence = true
}
flag_in_candidate_silence = true
curr_num_samples_found++ // increment counter of current stretch of silence candidates
} else {
if flag_currently_in_declared_silence == true {
fmt.Println("found silence stretch of samples from ", index_silence_starts, " to ", index_silence_ends)
}
flag_in_candidate_silence = false
flag_currently_in_declared_silence = false
curr_num_samples_found = 0
}
}
if flag_currently_in_declared_silence == true {
fmt.Println("found silence stretch of samples from ", index_silence_starts, " to ", index_silence_ends)
}
}
(code not tested - spouted directly from forehead)

Related

Configure SAI peripheral on STM32H7

I'm trying to play a sound, on a single speaker (mono), from a .wav file in SD card using a STM32H7 controller and freertos environment.
I currently managed to generate sound but it is very dirty and jerky.
I'd like to show the parsed header content of my wav file but my reputation score is below 10.
Most important data are :
format : PCM
1 Channel
Sample rate : 44100
Bit per sample : 16
I initialize the SAI2 block A this way :
void MX_SAI2_Init(void)
{
/* USER CODE BEGIN SAI2_Init 0 */
/* USER CODE END SAI2_Init 0 */
/* USER CODE BEGIN SAI2_Init 1 */
/* USER CODE END SAI2_Init 1 */
hsai_BlockA2.Instance = SAI2_Block_A;
hsai_BlockA2.Init.AudioMode = SAI_MODEMASTER_TX;
hsai_BlockA2.Init.Synchro = SAI_ASYNCHRONOUS;
hsai_BlockA2.Init.OutputDrive = SAI_OUTPUTDRIVE_DISABLE;
hsai_BlockA2.Init.NoDivider = SAI_MASTERDIVIDER_ENABLE;
hsai_BlockA2.Init.FIFOThreshold = SAI_FIFOTHRESHOLD_EMPTY;
hsai_BlockA2.Init.AudioFrequency = SAI_AUDIO_FREQUENCY_44K;
hsai_BlockA2.Init.SynchroExt = SAI_SYNCEXT_DISABLE;
hsai_BlockA2.Init.MonoStereoMode = SAI_MONOMODE;
hsai_BlockA2.Init.CompandingMode = SAI_NOCOMPANDING;
hsai_BlockA2.Init.TriState = SAI_OUTPUT_NOTRELEASED;
if (HAL_SAI_InitProtocol(&hsai_BlockA2, SAI_I2S_STANDARD, SAI_PROTOCOL_DATASIZE_16BIT, 2) != HAL_OK)
{
Error_Handler();
}
/* USER CODE BEGIN SAI2_Init 2 */
/* USER CODE END SAI2_Init 2 */
}
I think I set the clock frequency correctly, as I measure a frame synch clock of 43Khz (closest I can get to 44,1Khz)
The file indicate it's using PCM protocol. My init function indicate SAI_I2S_STANDARD but it's only because I was curious of the result with this parameter value. I have bad result in both cases.
And here is the part where I read the file + send data to the SAI DMA
//Before infinite loop I extract the overall file size in bytes.
// Infinite Loop
for(;;)
{
if(drv_sdcard_getDmaTransferComplete()==true)
{
// BufferRead[0]=0xAA;
// BufferRead[1]=0xAA;
//
// ret = HAL_SAI_Transmit_DMA(&hsai_BlockA2, (uint8_t*)BufferRead, 2);
// drv_sdcard_resetDmaTransferComplete();
if((firstBytesDiscarded == true)&& (remainingBytes>0))
{
//read the next BufferRead size audio samples
if(remainingBytes < sizeof(BufferAudio))
{
remainingBytes -= drv_sdcard_readDataNoRewind(file_audio1_index, BufferAudio, remainingBytes);
}
else
{
remainingBytes -= drv_sdcard_readDataNoRewind(file_audio1_index, BufferAudio, sizeof(BufferAudio));
}
//send them by the SAI through DMA
ret = HAL_SAI_Transmit_DMA(&hsai_BlockA2, (uint8_t*)BufferAudio, sizeof(BufferAudio));
//reset transmit flag for forbidding next transmit
drv_sdcard_resetDmaTransferComplete();
}
else
{
//discard header size first bytes
//I removed this part here because it works properly on my side
firstBytesDiscarded = true;
}
}
I have one track of sound quality improvment : it is to filter speaker input. Yesterday I tried cutting # 20Khz and 44khz but it cut too much the signal... So I want to try different cutting frequencies until I find the sound is of good quality. It is a simple RC filter.
But to fix the jerky part, I dont know what to do. To give you an idea on how the sound comes out, I would describe it like this :
we can hear a bit of melody
then scratchy sound [krrrrrrr]
then short silence
and this looping until the end of the file.
Buffer Audio size is 16*1024 bytes.
Thank you for your help
Problems
No double-buffering. You are reading data from the SD-card into the same buffer that you are playing from. So you'll get some samples from the previous read, and some samples from the new read.
Not checking when the DMA is complete. HAL_SAI_Transmit_DMA() returns immediately, and you cannot call it again until the previous DMA has completed.
Not checking return values of HAL functions. You assign ret = HAL_SAI_Transmit_DMAbut then never check what ret is. You should check if there is an error and take appropriate action.
You seem to be driving things from how fast the SD-card can DMA the data. It needs to be based on how fast the SAI is consuming it, otherwise you will have glitches.
Possible solution
The STM32's DMA controller can be configured to run in circular-buffer mode. In this mode, it will DMA all the data given to it, and then start again from the beginning.
It also provides interrupts for when the DMA is half complete, and when it is fully complete.
These two things together can provide a smooth data transfer with no gaps and glitches, if used with the SAI DMA. You'd read data into the entire buffer to start with, and kick off the DMA. When you get the half-complete interrupt, read half a buffer's worth of data into the first half of the buffer. When you get a fully complete interrupt, read half a buffer's worth of data into the second half of the buffer.
This is psuedo-code-ish, but hopefully shows what I mean:
const size_t buff_len = 16u * 1024u;
uint16_t buff[buff_len];
void start_playback(void)
{
read_from_file(buff, buff_len);
if HAL_SAI_Transmit_DMA(&hsai_BlockA2, buff, buff_len) != HAL_OK)
{
// Handle error
}
}
void sai_dma_tx_half_complete_interrupt(void)
{
read_from_file(buff, buff_len / 2u);
}
void sai_dma_tx_full_complete_interrupt(void)
{
read_from_file(buff + buff_len / 2u, buff_len / 2u);
}
You'd need to detect when you have consumed the entire file, and then stop the DMA (with something like HAL_SAI_DMAStop()).
You might want to read this similar question where I gave a similar answer. They were recording to SD-card rather than playing back, but the same principles apply. They also supplied their actual code for the solution they employed.

Combining multiple input channels to one output channel audio live

I am trying to make my own basic mixer and wanted to know how I could take multiple channels of input audio and outputting all of the channels as one mixed audio source with controllable levels for each input channel. Right now I am trying to use pyo but I am unable to mix the channels in real-time.
here is some pseudo code to combine multiple input channels into a single output channel where each input channel has its own volume control in array mix_volume
max_index = length(all_chan[0]) // identify audio buffer size
all_chan // assume all channels live in a two dimensional array where
// dimension 0 is which channel and dim 1 is index into each audio sample
mix_volume // array holding multiplication factor to control volume per channel
// each element a floating point value between 0.0 and 1.0
output_chan // define and/or allocate your output channel buffer
for index := 0; index < max_index; index++ {
curr_sample := 0 // output audio curve height for current audio sample
for curr_chan := 0; curr_chan < num_channels; curr_chan++ {
curr_sample += (all_chan[curr_chan][index] * mix_volume[curr_chan])
}
output_chan[index] = curr_sample / num_channels // output audio buffer
}
the trick to perform above on a live stream is to populate the above all_chan audio buffers inside an event loop where you copy into these buffers the audio sample values for each channel then execute above code from inside that event loop ... typically you will want your audio buffers to have about 2^12 ( 4096 ) audio samples ... experiment using larger or smaller buffer size ... too small and this event loop will become very cpu intensive yet too large and you will incur an audible delay ... have fun
you may want to use a compiled language like golang YMMV

First note played in AKSequencer is off

I am using AKSequencer to create a sequence of notes that are played by an AKMidiSampler. My problem is, at higher tempos the first note always plays with a little delay, no matter what i do.
I tried prerolling the sequence but it won't help. Substituting the AKMidiSampler with an AKSampler or a AKSamplePlayer (and using a callback track to play them) hasn't helped either, though it made me think that the problem probably resides in the sequencer or in the way I create the notes.
Here's an example of what I'm doing (I tried to make it as simple as I could):
import UIKit
import AudioKit
class ViewController: UIViewController {
let sequencer = AKSequencer()
let sampler = AKMIDISampler()
let callbackInst = AKCallbackInstrument()
var metronomeTrack : AKMusicTrack?
var callbackTrack : AKMusicTrack?
let numberOfBeats = 8
let tempo = 280.0
var startTime : TimeInterval = 0
override func viewDidLoad() {
super.viewDidLoad()
print("Begin setup.")
// Load .wav sample in AKMidiSampler
do {
try sampler.loadWav("tick")
} catch {
print("sampler.loadWav() failed")
}
// Create tracks for the sequencer and set midi outputs
metronomeTrack = sequencer.newTrack("metronomeTrack")
callbackTrack = sequencer.newTrack("callbackTrack")
metronomeTrack?.setMIDIOutput(sampler.midiIn)
callbackTrack?.setMIDIOutput(callbackInst.midiIn)
// Setup and start AudioKit
AudioKit.output = sampler
do {
try AudioKit.start()
} catch {
print("AudioKit.start() failed")
}
// Set sequencer tempo
sequencer.setTempo(tempo)
// Create the notes
var midiSequenceIndex = 0
for i in 0 ..< numberOfBeats {
// Add notes to tracks
metronomeTrack?.add(noteNumber: 60, velocity: 100, position: AKDuration(beats: Double(midiSequenceIndex)), duration: AKDuration(beats: 0.5))
callbackTrack?.add(noteNumber: MIDINoteNumber(midiSequenceIndex), velocity: 100, position: AKDuration(beats: Double(midiSequenceIndex)), duration: AKDuration(beats: 0.5))
print("Adding beat number \(i+1) at position: \(midiSequenceIndex)")
midiSequenceIndex += 1
}
// Set the callback
callbackInst.callback = {status, noteNumber, velocity in
if status == .noteOn {
let currentTime = Date().timeIntervalSinceReferenceDate
let noteDelay = currentTime - ( self.startTime + ( 60.0 / self.tempo ) * Double(noteNumber) )
print("Beat number: \(noteNumber) delay: \(noteDelay)")
} else if ( noteNumber == midiSequenceIndex - 1 ) && ( status == .noteOff) {
print("Sequence ended.\n")
self.toggleMetronomePlayback()
} else {return}
}
// Preroll the sequencer
sequencer.preroll()
print("Setup ended.\n")
}
#IBAction func playButtonPressed(_ sender: UIButton) {
toggleMetronomePlayback()
}
func toggleMetronomePlayback() {
if sequencer.isPlaying == false {
print("Playback started.")
startTime = Date().timeIntervalSinceReferenceDate
sequencer.play()
} else {
sequencer.stop()
sequencer.rewind()
}
}
}
Could anyone help? Thank you.
As Aure commented, the start up latency is a known problem. Even with preroll, there is still noticeable latency, especially at higher tempos.
But if you are using a looping sequence, I found that you can sometimes mitigate how noticeable the latency is by setting the 'starting point' of the sequence to a position after the final MIDI event, but within the loop length. If you can find a good position, you can get the latency effects out of the way before it loops back to your content.
Make sure to call setTime() before you need it (e.g., after stopping the sequence, not when you are ready to play) because the setTime()call itself can introduce about 200ms of wonkiness.
Edit:
As an afterthought, you could do the same thing on a non-looping sequence by enabling looping and using an arbitrarily long sequence length. If you needed playback to stop at the end of the MIDI content, you could do this with an AKCallbackInstrument triggered by an MIDI event placed just after the final note.
After a bit of testing I actually found out that it is not the first note that plays off but the subsequent notes that play in advance. Moreover, the amount of notes that play exactly on time when starting the sequencer depends on the set tempo.
The funny thing is that if the tempo is < 400 there will be one note played on time and the others in advance, if it is 400 <= bpm < 800 there will be two notes played correctly and the others in advance and so on, for every 400 bpm increment you get one more note played correctly.
So... since the notes are played in advance and not late, the solution that solved it for me is:
1) Use a sampler that is not connected directly to a track's midi output but has its .play() method called inside a callback.
2) Keep track of when the sequencer gets started
3) At every callback calculate when the note should play in relation to the start time and store what time it actually is, so you can then calculate the offset.
4) use the computed offset to dispatch_async after the offset your .play() method.
And that's it, I tested this on multiple devices and now all the notes play perfectly on time.
I had the same issue, preroll didn't help, but I have managed to solve it with a dedicated sampler for the first notes.
I used a delay on the other sampler, about 0.06 of a second, works like a charm.
Kind of a silly solution but it did the job and I could go on with the project :)
//This is for fixing AK bug that plays the first playback not in delay
let fixDelay = AKDelay()
fixDelay.dryWetMix = 1
fixDelay.feedback = 0
fixDelay.lowPassCutoff = 22000
fixDelay.time = 0.06
fixDelay.start()
let preDelayMixer = AKMixer()
let preFirstMixer = AKMixer()
[playbackSampler,vocalSampler] >>> preDelayMixer >>> fixDelay
[firstNoteVocalSampler, firstRoundPlaybackSampler] >>> preFirstMixer
[fixDelay,preFirstMixer] >>> endMixer

Minimum value over entire simulation

In a continuous model, how do I save the minimum value of a variable during the simulation?
When a simulation has finished I want to display a variable T_min with a graphical annotation that shows me the lowest value of a temperature T during the simulation.
For example, if the simulated temperature T was a sine function, the desired result for the value of T_min would be:
In discrete code this would look something like this:
T_min := Modelica.Constants.inf "Start value";
if T < T_min then
T_min := T;
else
T_min := T_min;
end if;
... but I would like a continuous implementation to avoid sampling, high number of events etc.
I'm not sure if Renes solution is optimal. The solution generates many state events generated by the two if conditions. Embedded in the following model:
model globalMinimum2
Real T, T_min;
Boolean is_true;
initial equation
T_min = T;
equation
T =time/10*sin(time);
// if statement ensures that 'T_min' doesn't integrate downwards...
// ... whenever der(T) is negative;
if T < T_min then
der(T_min) = min(0, der(T));
is_true=true;
else
der(T_min) = 0;
is_true=false;
end if;
end globalMinimum2;
The simulation log is the following:
Integration started at T = 0 using integration method DASSL
(DAE multi-step solver (dassl/dasslrt of Petzold modified by Dynasim))
Integration terminated successfully at T = 50
WARNING: You have many state events. It might be due to chattering.
Enable logging of event in Simulation/Setup/Debug/Events during simulation
CPU-time for integration : 0.077 seconds
CPU-time for one GRID interval: 0.154 milli-seconds
Number of result points : 3801
Number of GRID points : 501
Number of (successful) steps : 2519
Number of F-evaluations : 4799
Number of H-evaluations : 18822
Number of Jacobian-evaluations: 2121
Number of (model) time events : 0
Number of (U) time events : 0
Number of state events : 1650
Number of step events : 0
Minimum integration stepsize : 1.44e-005
Maximum integration stepsize : 5.61
Maximum integration order : 3
Perhaps it is better to detect two events as given in the following example:
model unnamed_2
Real T;
Real hold;
Real T_min;
Boolean take_signal;
initial equation
hold=T;
equation
T = time/10*sin(time);
when (T < pre(hold)) then
take_signal = true;
hold = T;
elsewhen (der(T) >=0) then
take_signal = false;
hold = T;
end when;
if (take_signal) then
T_min = T;
else
T_min = hold;
end if;
end unnamed_2;
The simulation log shows that this solutions is more efficient:
Log-file of program ./dymosim
(generated: Tue May 24 14:13:38 2016)
dymosim started
... "dsin.txt" loading (dymosim input file)
... "unnamed_2.mat" creating (simulation result file)
Integration started at T = 0 using integration method DASSL
(DAE multi-step solver (dassl/dasslrt of Petzold modified by Dynasim))
Integration terminated successfully at T = 50
CPU-time for integration : 0.011 seconds
CPU-time for one GRID interval: 0.022 milli-seconds
Number of result points : 549
Number of GRID points : 501
Number of (successful) steps : 398
Number of F-evaluations : 771
Number of H-evaluations : 1238
Number of Jacobian-evaluations: 373
Number of (model) time events : 0
Number of (U) time events : 0
Number of state events : 32
Number of step events : 0
Minimum integration stepsize : 4.65e-006
Maximum integration stepsize : 3.14
Maximum integration order : 1
Calling terminal section
... "dsfinal.txt" creating (final states)
It seems I was able to finde an answer to my own question simply by looking at the figure above
The code is quite simple:
model globalMinimum
Modelica.SIunits.Temperature T, T_min;
initial equation
T_min = T;
equation
// if statement ensures that 'T_min' doesn't integrate downwards...
// ... whenever der(T) is negative;
der(T_min) = if T < T_min then min(0, der(T)) else 0;
end globalMinimum;

gstreamer read decibel from buffer

I am trying to get the dB level of incoming audio samples. On every video frame, I update the dB level and draw a bar representing a 0 - 100% value (0% being something arbitrary such as -20.0dB and 100% being 0dB.)
gdouble sum, rms;
sum = 0.0;
guint16 *data_16 = (guint16 *)amap.data;
for (gint i = 0; i < amap.size; i = i + 2)
{
gdouble sample = ((guint16)data_16[i]) / 32768.0;
sum += (sample * sample);
}
rms = sqrt(sum / (amap.size / 2));
dB = 10 * log10(rms);
This was adapted to C from a code sample, marked as the answer, from here. I am wondering what it is that I am missing from this very simple equation.
Answered: jacket was correct about the code loosing the sign, so everything ended up being positive. Also the code 10 * log(rms) is incorrect. It should be 20 * log(rms) as I am converting amplitude to decibels (as a measure of outputted power).
The level element is best for this task (as #ensonic already mentioned) its intended for exactly what you need..
So basically you add to your pipe element called "level", then enable the messages triggering.
Level element then emits messages which contains values of RMS Peak and Decay. RMS is what you need.
You can setup callback function connected to such message event:
audio_level = gst_element_factory_make ("level", "audiolevel");
g_object_set(audio_level, "message", TRUE, NULL);
...
g_signal_connect (bus, "message::element", G_CALLBACK (callback_function), this);
bus variable is of type GstBus.. I hope you know how to work with buses
Then in callback function check for the element name and get the RMS like is described here
There is also normalization algorithm with pow() function to convert to value between 0.0 -> 1.0 which you can use to convert to % as you stated in your question.

Resources