AVCaptureSession audio samples captured at different frequency than AVAudioSession's sample rate - audio

I'm using AVFoundation capture session to output audio buffers through AVCaptureAudioDataOutput. The capture session is using the default application audio session. (ie. captureSession.usesApplicationAudioSession = true). I don't alter the audio session in any way.
The strange behavior is that the capture session returns audio buffers captured at a different frequency than the default audio session's sample rate.
Specifically:
print(AVAudioSession.sharedInstance().sampleRate) \\ 48000
but
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
if connection.audioChannels.first != nil {
print(sampleBuffer.presentationTimeStamp) \\ CMTime(value: 2199812320, timescale: 44100, flags: __C.CMTimeFlags(rawValue: 3), epoch: 0)
delegate?.captureOutput(sampleBuffer: sampleBuffer, mediaType: .audio)
}
}
My expected behavior is that the sample buffer's timescale would also be 48000.
For a little extra info, if I do change the default audio session, for example, change preferred sample rate to 48000, the sample buffer's timescale will change to 48000 as expected. Is this a bug or am I misunderstanding something?

You need to set the capture session's automaticallyConfiguresApplicationAudioSession to false and do your own audio session configuration before starting the capture session.
Like this:
// use audioSession.setPreferredSampleRate() to request desired sample rate
captureSession.automaticallyConfiguresApplicationAudioSession = false
try! AVAudioSession.sharedInstance().setCategory(.playAndRecord) // or just record
try! AVAudioSession.sharedInstance().setActive(true) // worked without this, but feels wrong

Related

Is there a way to check the volume level of all processes with pipewire/pulseaudio?

I'm trying to find a way to check if i have any desktop audio AND what processes is producing sounds.
After some searching i found a way to list all the sink input in pipewire/pulseaudio using pactl list sink-inputs however i have no idea if that input is muted or not
example output:
Sink Input #512
Driver: protocol-native.c
Owner Module: 9
Client: 795
Sink: 1
Sample Specification: float32le 2ch 48000Hz
Channel Map: front-left,front-right
Format: pcm, format.sample_format = "\"float32le\"" format.rate = "48000" format.channels = "2" format.channel_map = "\"front-left,front-right\""
Corked: yes
Mute: no
Volume: front-left: 43565 / 66% / -10.64 dB, front-right: 43565 / 66% / -10.64 dB
balance 0.00
Buffer Latency: 165979 usec
Sink Latency: 75770 usec
Resample method: speex-float-1
Properties:
media.name = "Polish cow (English Lyrics Full Version) - YouTube"
application.name = "Firefox"
native-protocol.peer = "UNIX socket client"
native-protocol.version = "35"
application.process.id = "612271"
application.process.user = "user"
application.process.host = "host"
application.process.binary = "firefox"
application.language = "en_US.UTF-8"
window.x11.display = ":0"
application.process.machine_id = "93e71eeba04e43789f0972b7ea0e4b39"
application.process.session_id = "2"
application.icon_name = "firefox"
module-stream-restore.id = "sink-input-by-application-name:Firefox"
The obvious thing would be looking at the Mute and Volume line but that is not reliable at all, currently the youtube video is paused but Mute is show as no and Volume is still no different from when the youtube video is actually playing.
I need the solution to be script-able since I'll muting certain thing when there is another process that is making sounds, and play it again when there is no sound, using bash script. If it is not possible on pipewire/pulseaudio but it is possible with another sound server then please do tell me.

Sound activated recording in Julia

I'm recording audio with Julia and want to be able to trigger a 5 second recording after the audio signal exceeds a certain volume. This is my record script so far:
using PortAudio, SampledSignals, LibSndFile, FileIO, Dates
stream = PortAudioStream("HDA Intel PCH: ALC285 Analog (hw:0,0)")
buf = read(stream, 5s)
close(stream)
save(string("recording_", Dates.format(now(), "yyyymmdd_HHMMSS"), ".wav"), buf, Fs = 48000)
I'm new to Julia and signal processing in general. How can I tell this only to start recording once the audio exceeds a specified volume threshold?
You need to test the sound you capture for average amplitude and act on that. Save if loud enough, otherwise rinse and repeat.
using PortAudio, SampledSignals, LibSndFile, FileIO
const hassound = 10 # choose this to fit
suprathreshold(buf, thresh = hassound) = norm(buf) / sqrt(length(buf)) > thresh # power over threshold
stream = PortAudioStream("HDA Intel PCH: ALC285 Analog (hw:0,0)")
while true
buf = read(stream, 5s)
close(stream)
if suprathreshold(buf)
save("recording.wav", buf, Fs = 48000) # should really append here maybe???
end
end

Reading console output from mplayer to parse track's position/length

When you run mplayer, it will display the playing track's position and length (among some other information) through, what I'd assume is, stdout.
Here's a sample output from mplayer:
MPlayer2 2.0-728-g2c378c7-4+b1 (C) 2000-2012 MPlayer Team
Cannot open file '/home/pi/.mplayer/input.conf': No such file or directory
Failed to open /home/pi/.mplayer/input.conf.
Cannot open file '/etc/mplayer/input.conf': No such file or directory
Failed to open /etc/mplayer/input.conf.
Playing Bomba Estéreo - La Boquilla [Dixone Remix].mp3.
Detected file format: MP2/3 (MPEG audio layer 2/3) (libavformat)
[mp3 # 0x75bc15b8]max_analyze_duration 5000000 reached
[mp3 # 0x75bc15b8]Estimating duration from bitrate, this may be inaccurate
[lavf] stream 0: audio (mp3), -aid 0
Clip info:
album_artist: Bomba Estéreo
genre: Latin
title: La Boquilla [Dixone Remix]
artist: Bomba Estéreo
TBPM: 109
TKEY: 11A
album: Unknown
date: 2011
Load subtitles in .
Selected audio codec: MPEG 1.0/2.0/2.5 layers I, II, III [mpg123]
AUDIO: 44100 Hz, 2 ch, s16le, 320.0 kbit/22.68% (ratio: 40000->176400)
AO: [pulse] 44100Hz 2ch s16le (2 bytes per sample)
Video: no video
Starting playback...
A: 47.5 (47.4) of 229.3 (03:49.3) 4.1%
The last line (A: 47.5 (47.4) of 229.3 (03:49.3) 4.1%) is what I'm trying to read but, for some reason, it's never received by the Process.OutputDataReceived event handler.
Am I missing something? Is mplayer using some non-standard way of outputting the "A:" line to the console?
Here's the code in case it helps:
Public Overrides Sub Play()
player = New Process()
player.EnableRaisingEvents = True
With player.StartInfo
.FileName = "mplayer"
.Arguments = String.Format("-ss {1} -endpos {2} -volume {3} -nolirc -vc null -vo null ""{0}""",
tmpFileName,
mTrack.StartTime,
mTrack.EndTime,
100)
.CreateNoWindow = False
.UseShellExecute = False
.RedirectStandardOutput = True
.RedirectStandardError = True
.RedirectStandardInput = True
End With
AddHandler player.OutputDataReceived, AddressOf DataReceived
AddHandler player.ErrorDataReceived, AddressOf DataReceived
AddHandler player.Exited, Sub() KillPlayer()
player.Start()
player.BeginOutputReadLine()
player.BeginErrorReadLine()
waitForPlayer.WaitOne()
KillPlayer()
End Sub
Private Sub DataReceived(sender As Object, e As DataReceivedEventArgs)
If e.Data = Nothing Then Exit Sub
If e.Data.Contains("A: ") Then
' Parse the data
End If
End Sub
Apparently, the only solution is to run mplayer in "slave" mode, as explained here: http://www.mplayerhq.hu/DOCS/tech/slave.txt
In this mode we can send commands to mplayer (via stdin) and the response (if any) will be sent via stdout.
Here's a very simple implementation that displays mplayer's current position (in seconds):
using System;
using System.Threading;
using System.Diagnostics;
using System.Collections.Generic;
namespace TestMplayer {
class MainClass {
private static Process player;
public static void Main(string[] args) {
String fileName = "/home/pi/Documents/Projects/Raspberry/RPiPlayer/RPiPlayer/bin/Electronica/Skrillex - Make It Bun Dem (Damian Marley) [Butch Clancy Remix].mp3";
player = new Process();
player.EnableRaisingEvents = true;
player.StartInfo.FileName = "mplayer";
player.StartInfo.Arguments = String.Format("-slave -nolirc -vc null -vo null \"{0}\"", fileName);
player.StartInfo.CreateNoWindow = false;
player.StartInfo.UseShellExecute = false;
player.StartInfo.RedirectStandardOutput = true;
player.StartInfo.RedirectStandardError = true;
player.StartInfo.RedirectStandardInput = true;
player.OutputDataReceived += DataReceived;
player.Start();
player.BeginOutputReadLine();
player.BeginErrorReadLine();
Thread getPosThread = new Thread(GetPosLoop);
getPosThread.Start();
}
private static void DataReceived(object o, DataReceivedEventArgs e) {
Console.Clear();
Console.WriteLine(e.Data);
}
private static void GetPosLoop() {
do {
Thread.Sleep(250);
player.StandardInput.Write("get_time_pos" + Environment.NewLine);
} while(!player.HasExited);
}
}
}
I found the same problem with another application that works more or less in a similar way (dbPowerAmp), in my case, the problem was that the process output uses Unicode encoding to write the stdout buffer, so I have to set the StandardOutputEncoding and StandardError to Unicode to be able start reading.
Your problem seems to be the same, because if "A" cannot be found inside the output that you published which clearlly shows that existing "A", then probably means that the character differs when reading in the current encoding that you are using to read the output.
So, try setting the proper encoding when reading the process output, try setting them to Unicode.
ProcessStartInfo.StandardOutputEncoding
ProcessStartInfo.StandardErrorEncoding
Using "read" instead of "readline", and treating the input as binary, will probably fix your problem.
First off, yes, mplayer slave mode is probably what you want. However, if you're determined to parse the console output, it is possible.
Slave mode exists for a reason, and if you're half serious about using mplayer from within your program, it's worth a little time to figure out how to properly use it. That said, I'm sure there's situations where the wrapper is the appropriate approach. Maybe you want to pretend that mplayer is running normally, and control it from the console, but secretly monitor file position to resume it later? The wrapper might be easier than translating all of mplayers keyboard commands into slave mode commands?
Your problem is likely that you're trying to use "readline" from within python on an endless line. That line of output contains \r instead of \n as the line separator, so readline will treat it as a single endless line. sed also fails this way, but other commands (such as grep) treat \r as \n under some circumstances.
Handling of \r is inconsistent, and can't be relied on. For instance, my version of grep treats \r as \n when matching IF output is a console, and uses \n to seperate the output. But if output is a pipe, it treats it as any other character.
For instance:
mplayer TMBG-Older.mp3 2>/dev/null | tr '\r' '\n' | grep "^A: " | sed 's/^A: *\([0-9.]*\) .*/\1/' | tail -n 1
I'm using "tr" here to force it to '\n', so other commands in the pipe can deal with it in a consistent manner.
This pipeline of commands outputs a single line, containing ONLY the ending position in seconds, with decimal point. But if you remove the "tr" command from this pipe, bad things happen. On my system, it shows only "0.0" as the position, as "sed" doesn't deal well with the '\r' line separators, and ALL the position updates are treated as the same line.
I'm fairly sure python doesn't handle \r well either, and that's likely your problem. If so, using "read" instead of "readline" and treating it like binary is probably the correct solution.
There are other problems with this approach though. Buffering is a big one. ^C causes this command to output nothing, mplayer must quit gracefully to show anything at all, as pipelines buffers things, and buffers get discarded on SIGINT.
If you really wanted to get fancy, you could probably cat several input sources together, tee the output several ways, and REALLY write a wrapper around mplayer. A wrapper that's fragile, complicated, and might break every time mplayer is updated, a user does something unexpected, or the name of the file being played contains something weird, SIGSTOP or SIGINT. And probably other things that I haven't though of.

Linux ALSA Driver using channel count 3

Am running my ALSA Driver on Ubuntu 14.04, 64bit, 3.16.0-30-generic Kernel.
Hardware is proprietary hardware, hence cant give much details.
Following is the existing driver implementation:
Driver is provided sample format, sample rate, channel_count as input via module parameter. (Due to requirements need to provide inputs via module parameters)
Initial snd_pcm_hardware structure for playback path.
#define DEFAULT_PERIOD_SIZE (4096)
#define DEFAULT_NO_OF_PERIODS (1024)
static struct snd_pcm_hardware xxx_playback =
{
.info = SNDRV_PCM_INFO_MMAP |
SNDRV_PCM_INFO_INTERLEAVED |
SNDRV_PCM_INFO_MMAP_VALID |
SNDRV_PCM_INFO_SYNC_START,
.formats = SNDRV_PCM_FMTBIT_S16_LE,
.rates = (SNDRV_PCM_RATE_8000 | \
SNDRV_PCM_RATE_16000 | \
SNDRV_PCM_RATE_48000 | \
SNDRV_PCM_RATE_96000),
.rate_min = 8000,
.rate_max = 96000,
.channels_min = 1,
.channels_max = 1,
.buffer_bytes_max = (DEFAULT_PERIOD_SIZE * DEFAULT_NO_OF_PERIODS),
.period_bytes_min = DEFAULT_PERIOD_SIZE,
.period_bytes_max = DEFAULT_PERIOD_SIZE,
.periods_min = DEFAULT_NO_OF_PERIODS,
.periods_max = DEFAULT_NO_OF_PERIODS,
};
Similar values for captures side snd_pcm_hardware structure.
Please, note that the following below values are replaced in playback open entry point, based on the current audio test configuration:
(user provides audio format, audio rate, ch count via module parameters as inputs to the driver, which are refilled in snd_pcm_hardware structure)
xxx_playback.formats = user_format_input
xxx_playback.rates = xxx_playback.rate_min, xxx_playback.rate_max = user_sample_rate_input
xxx_playback.channels_min = xxx_playback.channels_max = user_channel_input
Similarly values are re-filled for capture snd_pcm_hardware structure in capture open entry point.
Hardware is configured for clocks based on channel_count, format, sample_rate and driver registers successfully with ALSA layer
Found aplay/arecord working fine for channel_count = 1 or 2 or 4
During aplay/arecord, in driver when "runtime->channels" value is checked, it reflects the channel_count configured, which sounds correct to me.
Record data matches with played, since its a loop back test.
But when i use channel_count = 3, Both aplay or arecord reports
"Broken configuration for this PCM: no configurations available"!! for a wave file with channel_count '3'
ex: Playing WAVE './xxx.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Channels 3
ALSA lib pcm_params.c:2162:(snd1_pcm_hw_refine_slave) Slave PCM not usable
aplay: set_params:1204: Broken configuration for this PCM: no configurations available
With Following changes I was able to move ahead a bit:
.........................
Method1:
Driver is provided channel_count '3' as input via module parameter
Modified Driver to fill snd_pcm_hardware structure as payback->channels_min = 2 & playback->channels_min = 3; Similar values for capture path
aplay/arecord reports as 'channel count not available', though the wave file in use has 3 channels
ex: aplay -D hw:CARD=xxx,DEV=0 ./xxx.wav Playing WAVE './xxx.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Channels 3
aplay: set_params:1239: Channels count non available
Tried aplay/arecord with plughw, and aplay/arecord moved ahead
arecord -D plughw:CARD=xxx,DEV=0 -d 3 -f S16_LE -r 48000 -c 3 ./xxx_rec0.wav
aplay -D plughw:CARD=xxx,DEV=0 ./xxx.wav
Recording WAVE './xxx_rec0.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Channels 3
Playing WAVE './xxx.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Channels 3
End of Test
During aplay/arecord, In driver when "runtime->channels" value is checked it returns value 2!!! But played wavefile has ch count 3...
When data in recorded file is checked its all silence
.........................
Method2:
Driver is provided channel_count '3' as input via module parameter
Modified Driver to fill snd_pcm_hardware structure as playback->channels_min = 3 & playback->channels_min = 4; Similar values for capture path
aplay/arecord reports as 'channel count not available', though the wave file in use has 3 channels
Tried aplay/arecord with plughw, and aplay/arecord moved ahead
During aplay/arecord, In driver when "runtime->channels" value is checked it returns value 4!!! But played wavefile has ch count 3...
When data in recorded file is checked its all silence
.........................
So from above observations, the runtime->channels is either 2 or 4, but never 3 channels was used by alsa stack though requested. When used Plughw, alsa is converting data to run under 2 or 4 channel.
Can anyone help why am unable to use channel count 3.
Will provide more information if needed.
Thanks in Advance.
A period (and the entire buffer) must contain an integral number of frames, i.e., you cannot have partial frames.
With three channels, one frame has six bytes. The fixed period size (4096) is not divisible by six without remainder.
Thanks CL.
I used period size 4092 for this particular test case with channel count 3, and was able to do loop back successfully (without using plughw).
One last question, when I used plughw earlier, and when runtime->channels was either 2 or 4, why was the recorded data not showing?

Is there any crossbrowser solution for playing flac? (or is it possible in theory to make one)

Not interested in silverlight. Flash/javascript/html5 solutions are acceptable.
If you do not know such solutions, could you please say is it possible to make such that or not?
When I had to play FLAC in-browser, my starting point was also the Aurora framework.
However, the Aurora player is geared around using ScriptProcessorNode to decode chunks of audio on the fly. This didn't pan out for many reasons.
Seeking Flac in Aurora was never implemented.
Stuttering and unacceptable performance in Firefox, even on a mid-range 2014 desktop.
Not feasable to offload decoding to a WebWorker.
Doesn't inter-operate with audio formats the browser does support.
I didn't want to be responsible for re-sampling the sample-rate, seeking, and other low-level audio tasks that Aurora necessarily assimilates.
Decoding offline: Flac to Wave
My solution was to decode the Flac to raw 16bit PCM audio, using a stripped down Aurora.js Assset class + dependencies.
Look in the source for Asset.get( 'format', callback ), Asset.fromFile, and Asset.prototype.decodeToBuffer.
Next, take the audio data, along with extracted values for sample-rate and channel count, and build a WAVE file. This can be played using an HTML5 audio element, sent though an audio graph using createMediaElementSource, or absolutely anything you can do with natively supported audio formats.
Note: Replace clz function in decoder.js with the native Math.clz32 to boost performance, and polyfill clz32 for old browsers.
Disadvantage
The decoding time. Around 5 seconds at ~100% CPU for an "average" 4min song.
Advantages
Blob (opposed to arraybuffer) isn't constrained by RAM, and the browser can swap it to disk. Original Flac data can likely be discarded too.
You get seeking for free.
You get sample-rate re-sampling for free.
CPU activity paid for upfront in WebWorker.
Should browsers EVER gain native Flac support, very easy to rip out. It doesn't create a strong dependency on Aurora.
Here's the function to build the WAVE header, and turn the raw PCM data into something the browser can natively play.
function createWave( audioData, sampleRate, channelCount )
{
const audioFormat = 1, // 2 PCM = 1
subChunk1Size= 16, // 4 PCM = 16
bitsPerSample= 16, // 2
blockAlign = channelCount * (bitsPerSample >> 3), // 2
byteRate = blockAlign * sampleRate, // 4
subChunk2Size= blockAlign * audioData.size, // 4
chunkSize = 36 + subChunk2Size, // 4
// Total header size 44 bytes
header = new DataView( new ArrayBuffer(44) );
header.setUint32( 0, 0x52494646 ); // chunkId=RIFF
header.setUint32( 4, chunkSize, true );
header.setUint32( 8, 0x57415645 ); // format=WAVE
header.setUint32( 12, 0x666d7420 ); // subChunk1Id=fmt
header.setUint32( 16, subChunk1Size, true );
header.setUint16( 20, audioFormat, true );
header.setUint16( 22, channelCount, true );
header.setUint32( 24, sampleRate, true );
header.setUint32( 28, byteRate, true );
header.setUint16( 32, blockAlign, true );
header.setUint16( 34, bitsPerSample, true );
header.setUint32( 36, 0x64617461 ); // subChunk2Id=data
header.setUint32( 40, subChunk2Size, true );
return URL.createObjectURL( new Blob( [header, audioData], {type: 'audio/wav'} ) );
}
A simple Google search led me to these sites:
Aurora and FLAC.js — audio codecs using the Web Audio API
Introducing FLAC.js: A Pure JavaScript FLAC Decoder
Believe it or not, it wasn't so hard.
Almost forgot:
Check HTML5Test to compare browsers performance/compatibility with the <audio> tag and it's siblings.

Resources