MediaDevices.getUserMedia() How can I set audio constraints (sampling rate/bit depth)? - audio

With browser Web API, I'd like to set MediaDevices.getUserMedia constraints attributes, suitable to record audio speech (voice messages), e.g. setting these parameters:
mono
16bit
16KHz
Here my code:
const mediaStreamConstraints = {
audio: {
channelCount: 1,
sampleRate: 16000,
sampleSize: 16,
volume: 1
},
video: false
}
navigator.mediaDevices.getUserMedia(mediaStreamConstraints)
.catch( err => serverlog(`ERROR mediaDevices.getUserMedia: ${err}`) )
.then( stream => {
// audio recorded as Blob
// and the binary data are sent via socketio to a nodejs server
// that store blob as a file (e.g. audio/inp/audiofile.webm)
} )
The recorded clip is grabbed and stored (using MediaRecorder API), eventually sent to a nodejs server where the blob is saved as a file and processed (the application is a voicebot).
Something goes wrong and the WebM saved file hasn't the required parameters:
$ mediainfo audio/inp/audiofile.webm
General
Complete name : audio/inp/audiofile.webm
Format : WebM
Format version : Version 4 / Version 2
File size : 2.04 KiB
Writing application : Chrome
Writing library : Chrome
IsTruncated : Yes
Audio
ID : 1
Format : Opus
Codec ID : A_OPUS
Channel(s) : 1 channel
Channel positions : Front: C
Sampling rate : 48.0 kHz
Bit depth : 32 bits
Compression mode : Lossy
Language : English
Default : Yes
Forced : No
E.g.
Sampling rate : 48.0 kHz
Bit depth : 32 bits
But constraints would imply different values:
Sampling rate : 16 kHz
Bit depth : 16 bits
Also the blob, played with anew Audio(audioUrl(blob)).play(), doesn't play. Weird. But all works if constraints are just:
const mediaStreamConstraints = { audio: true }
I checked the browser console and I didn't see any error of navigator.mediaDevices.getUserMedia(mediaStreamConstraints) API call.
BTW, I followed guidelines here:
https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getUserMedia
https://developer.mozilla.org/en-US/docs/Web/API/MediaTrackConstraints
Note that my user agent is: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36 (I'm using a last Brave browser version).
Seems to me that any audio constraints settings not allowed by the browser:
brokes the audio blob,
without raising an error exception (I catched both navigator.mediaDevices.getUserMedia() and new MediaRecorder(...). Isn't this last at least a bug?
My question is:
There is any way to have sampling rate/bit depth as requested?
Or the audio format is "hardcoded"/decided by browser implementation?
BTW, The reason of audio parameters formats is because I want to minimize audio blob size to minimize bandwidth in websocket communications between the browser client and the server, optimizing audio blob exchanges for speech (voice messages)

Check the capabilities of your browser first:
let stream = await navigator.mediaDevices.getUserMedia({audio: true});
let track = stream.getAudioTracks()[0];
console.log(track.getCapabilities());
demo output:
autoGainControl: (2) [true, false]
channelCount: {max: 2, min: 1}
deviceId: "default"
echoCancellation: (2) [true, false]
groupId: "1e76386ad54f9ad3548f6f6c14c08e7eff6753f9362d93d8620cc48f546604f5"
latency: {max: 0.01, min: 0.01}
noiseSuppression: (2) [true, false]
sampleRate: {max: 48000, min: 48000}
sampleSize: {max: 16, min: 16}

Try setting your audio constraints on the audio media track within your stream before you instantiate MediaRecorder.
Something like this, not debugged:
const constraints = {
audio: {
channelCount: 1,
sampleRate: 16000,
sampleSize: 16,
volume: 1
},
navigator.mediaDevices.getUserMedia({audio:true})
.catch( err => serverlog(`ERROR mediaDevices.getUserMedia: ${err}`) )
.then( stream => {
const audioTracks = stream.getAudioTracks()
if (audioTracks.length !== 1) throw new Error ('too many tracks???')
const audioTrack = audioTracks[0]
audioTrack.applyConstraints (constraints)
.then (()=> {
const mediaRecorder = new MediaRecorder(stream)
/* etc etc etc */
} )
.catch(console.error) /* you might get constraint failure here. */
} )
All that being said, the Opus audio codec does a good job compressing voice to a reasonable size. Just because it sez 48kHz x 32bits doesn't mean it uses that much bandwidth; the audio signal is compressed.
And, try it on the most recent releases of Google Chrome and/or Firefox. This media stuff is in active development.

The answer by O.Jones is on the right track but can be more concise by setting the constraint directly when grabbing the userMedia object.
const constraints = {
audio: {
channelCount: 1,
sampleRate: 16000,
sampleSize: 16,
volume: 1
}
}
navigator.mediaDevices.getUserMedia({audio: constraints})
.then( stream => {
// Do something with the stream...
})
.catch( err => serverlog(`ERROR mediaDevices.getUserMedia: ${err}`) )

As mentioned in the comments, downsampling from getUserMedia with the exact keyword in the constraints object fails with an OverconstrainedError:
navigator.mediaDevices.getUserMedia({
audio: {
sampleRate: { exact: 16000 }
},
video: false
})
.then(stream => {
// Do something with the stream...
})
.catch(error => { console.log('Error :', error) })
And without the exact keyword, the obtained stream doesn't fulfill the request:
navigator.mediaDevices.getUserMedia({
audio: {
sampleRate: 16000
},
video: false
})
.then(stream => {
console.log('Sample rate :', stream.getAudioTracks()[0].getSettings().sampleRate)
// Shows 'Sample rate : 48000'
})
.catch(error => { console.log('Error :', error) })
You may want to try to downsample the audio stream from the microphone using the WebAudio API and create a new MediaStream to record using the AudioContext.createMediaStreamDestination, documented here.
Here is a code snippet for Chrome and Safari (Firefox throws an exception, see here):
const audioContext = new (window.AudioContext || window.webkitAudioContext)({
sampleRate: 16000
});
const mediaStream = await navigator.mediaDevices.getUserMedia({ audio: true, video: false })
const mediaStreamSource = audioContext.createMediaStreamSource(mediaStream)
const mediaStreamDestination = audioContext.createMediaStreamDestination()
mediaStreamDestination.channelCount = 1
mediaStreamSource.connect(mediaStreamDestination)
const mediaRecorder = new MediaRecorder(mediaStreamDestination.stream)

Related

FFMPEG Encoding a video from a Readable stream

I'm facing an issue with the seeked event in Chrome. The issue seems to be due to how the video being seeked is encoded.
The problem seems to occur most frequently when using ytdl-core and piping a Readable stream into an FFMPEG child process.
let videoStream: Readable = ytdl.downloadFromInfo(info, {
...options,
quality: "highestvideo"
});
With ytdl-core in order to get the highest quality you must combine the audio and video. So here is how I am doing it.
const ytmux = (link, options: any = {}) => {
const result = new stream.PassThrough({
highWaterMark: options.highWaterMark || 1024 * 512
});
ytdl.getInfo(link, options).then((info: videoInfo) => {
let audioStream: Readable = ytdl.downloadFromInfo(info, {
...options,
quality: "highestaudio"
});
let videoStream: Readable = ytdl.downloadFromInfo(info, {
...options,
quality: "highestvideo"
});
// create the ffmpeg process for muxing
let ffmpegProcess: any = cp.spawn(
ffmpegPath.path,
[
// supress non-crucial messages
"-loglevel",
"8",
"-hide_banner",
// input audio and video by pipe
"-i",
"pipe:3",
"-i",
"pipe:4",
// map audio and video correspondingly
// no need to change the codec
// output mp4 and pipe
"-c:v",
"libx264",
"-x264opts",
"fast_pskip=0:psy=0:deblock=-3,-3",
"-preset",
"veryslow",
"-crf",
"18",
"-c",
"copy",
"-pix_fmt",
"yuv420p",
"-movflags",
"frag_keyframe+empty_moov",
"-g",
"300",
"-f",
"mp4",
"-map",
"0:v",
"-map",
"1:a",
"pipe:5"
],
{
// no popup window for Windows users
windowsHide: true,
stdio: [
// silence stdin/out, forward stderr,
"inherit",
"inherit",
"inherit",
// and pipe audio, video, output
"pipe",
"pipe",
"pipe"
]
}
);
audioStream.pipe(ffmpegProcess.stdio[4]);
videoStream.pipe(ffmpegProcess.stdio[3]);
ffmpegProcess.stdio[5].pipe(result);
});
return result;
};
I am playing around with tons of different arguments. The result of this video gets uploaded to a Google Bucket. Then when seeking in Chrome I am getting some issues with certain frames, they are not being seeked.
When I pass it through FFMPEG locally and re-encode it, then upload it, I notice there are no issues.
Here is an image comparing the two results when running ffmpeg -i FILE (the one on the left works fine and the differences are minor)
I tried adjusting the arguments in the muxer code and am continuing to try and compare with the re-encoded video. I have no idea why this is happening, something to do with the frames.

Can I send an arbitrary chunk of a WebM (starting at a byte offset) to a mediaSource buffer to be played?

I'm trying to send only a specific truncated portion of a WebM file, starting from an arbitrary keyframe, from a Node.js server to be played back by a client using MediaSource buffering, but I'm not sure if this is possible or how to go about doing it.
So far this is what I'm trying:
find the byte offsets and sizes of init segment + keyframe clusters using mse_json_manifest from https://github.com/acolwell/mse-tools
concat streams for the init segment and a randomly chosen media segment
send the streams through either an HTTP request or a socket event to the client
It looks like sending the init segment always works, as the HTMl5 video player for the client shows the original file's duration, but it doesn't buffer the concatenated media segments sent afterwards.
Here's the relevant server code:
const merge = (...streams: ReadStream[]) => {
let pass = new PassThrough();
let waiting = streams.length;
for (let stream of streams) {
pass = stream.pipe(pass, {end: false});
stream.once("end", () => --waiting === 0 && pass.emit("end"));
}
return pass;
}
io.on("connection", (socket) => {
const streams = [
fs.createReadStream(file, {
start: audioJson.init.offset,
end: audioJson.init.size,
}),
fs.createReadStream(file, {
start: audioJson.media[150].offset,
})
];
merge(...streams).on("data", (data) => socket.emit("audio-data", data));
});
The client:
const streamVideo = document.getElementById("webmStream");
const mediaSource = new MediaSource();
const streamSource = URL.createObjectURL(mediaSource);
streamVideo.src = streamSource;
const audioMime = `audio/webm; codecs="opus"`;
const videoMime = `video/webm; codecs="vp9"`;
mediaSource.addEventListener("sourceopen", () => {
const audioBuffer = mediaSource.addSourceBuffer(audioMime);
const audioChunks = [];
function appendOrQueueChunk(chunk) {
if (!audioBuffer.updating && !audioChunks.length) {
audioBuffer.appendBuffer(chunk);
} else {
audioChunks.push(chunk);
}
}
socket.on("audio-data", appendOrQueueChunk);
audioBuffer.addEventListener("updateend", () => {
if (audioChunks.length) audioBuffer.appendBuffer(audioChunks.shift());
});
And a snippet of the JSON:
{
"type": "audio/webm;codecs=\"opus\"",
"duration": 93100.000000,
"init": { "offset": 0, "size": 526},
"media": [
{ "offset": 526, "size": 10941, "timecode": 0.000000 },
{ "offset": 11467, "size": 10382, "timecode": 0.260000 },
{ "offset": 21849, "size": 10301, "timecode": 0.520000 },
{ "offset": 32150, "size": 10495, "timecode": 0.780000 },
...
The socket streaming works fine as long as I just emit socket events directly from a fs.ReadStream of the entire WebM file so maybe it has something to do with sending streams in sequence but I feel completely out of my depth and think I'm missing something conceptually.
You don't even need MediaSource for this. The regular video element can stream from your Node.js server via a simple HTTP request. No need for Socket.IO and what not either.
<video src="https://nodejs-stream-server.example.com/something"></video>
I don't know the library you're using, so I'll tell you how I've done this exact task in the past in more generic terms, and maybe you can adapt it or re-implement.
Firstly, when the request for the media stream comes in to your Node.js server, you must send some initialization data. It sounds like you're already doing this successfully. This initialization data is basically everything in the stream up to the first Cluster element.
So, when your encoder starts, be sure to buffer the data up to then so you have it ready to send to new clients.
Next, you can start at an arbitrary Cluster element as long as that Cluster begins with a keyframe (for video). If this isn't working now, I suspect your Clusters aren't starting with keyframes, or there is something otherwise strange about them. In your JSON, you show an audio stream... was that intentional?
I'd recommend reading up on EBML, which is essentially the base container format for Matroska/WebM. Matroska is just a schema of sorts for the EBML document. WebM is just Matroska, but specified to a core set of codecs.
So, yes, in summary I think you have the concept, but it can be simplified.
Some other things you might find helpful:
https://stackoverflow.com/a/45172617/362536

RecordRTC.js : Duplicate Frame and Frame Drop when timeSlice is 100 and send data to WebSocket

RecordRTC v5.6.1
we are using the below configuration and send realtime audio data on web socket but we are facing duplicate frames and some frames are dropped in the final output
var options = {
mimeType: 'audio/wav',
type: 'audio',
checkForInactiveTracks: true,
numberOfAudioChannels: 1,
timeSlice: 100,
noWorker: true,
recorderType: StereoAudioRecorder,
ondataavailable: function (e) {
socketSend(e);
}
};
recorder = new RecordRTC(stream, options);
If we set desiredSampRate to 16000, the issue reproduces rate is increased.
Please find attached sample audio output for Frame Drop and Duplicate Frame
Frame_Drop_Duplicate.zip
Can you please suggest how we can capture 16k mono data and send to WebSocket without a frame drop and duplicate frame

Recording .wav with nodejs causes .wav file to be stretched

When I recording using wav's FileWriter
https://www.npmjs.com/package/wav#filewriterpath-optionsit my .wav file is stretched so when I record 5s of audio I get a .wav that is ~10s+. Anyone have any idea why this might happen?
var mic = require('mic');
var micInstance = mic({
rate: '48000',
channels: '2'
});
var micInputStream = micInstance.getAudioStream();
var outputFileStream = new FileWriter('./test.wav', {
sampleRate: 48000,
channels: 2
});
micInputStream.pipe(outputFileStream);
micInstance.start();
setTimeout(function() {
micInstance.stop();
}, 5000);
Conclusion After doing more research it appears to just be a problem with mic given that many other packages that do the exact same thing work appropriately. Note that I used (Windows 10 / SoX 14.4.1). For those wondering what I ended up using instead npmjs.com/package/node-audiorecorder

How to capture the first 10 seconds of an mp3 being streamed over HTTP

disclaimer: newbie to nodeJS and audio parsing
I'm trying to proxy a digital radio stream through an expressJS app with the help of node-icecast which works great. I am getting the radio's mp3 stream, and via node-lame decoding the mp3 to PCM and then sending it to the speakers. All of this just works straight from the github project's readme example:
var lame = require('lame');
var icecast = require('icecast');
var Speaker = require('speaker');
// URL to a known Icecast stream
var url = 'http://firewall.pulsradio.com';
// connect to the remote stream
icecast.get(url, function (res) {
// log the HTTP response headers
console.error(res.headers);
// log any "metadata" events that happen
res.on('metadata', function (metadata) {
var parsed = icecast.parse(metadata);
console.error(parsed);
});
// Let's play the music (assuming MP3 data).
// lame decodes and Speaker sends to speakers!
res.pipe(new lame.Decoder())
.pipe(new Speaker());
});
I'm now trying to setup a service to identify the music using the Doreso API. Problem is I'm working with a stream and don't have the file (and I don't know enough yet about readable and writable streams, and slow learning). I have been looking around for a while at trying to write the stream (ideally to memory) until I had about 10 seconds worth. Then I would pass that portion of audio to my API, however I don't know if that's possible or know where to start with slicing 10 seconds of a stream. I thought possibly trying passing the stream to ffmpeg as it has a -t option for duration, and perhaps that could limit it, however I haven't got that to work yet.
Any suggestions to cut a stream down to 10 seconds would be awesome. Thanks!
Updated: Changed my question as I originally thought I was getting PCM and converting to mp3 ;-) I had it backwards. Now I just want to slice off part of the stream while the stream still feeds the speaker.
It's not that easy.. but I've managed it this weekend. I would be happy if you guys could point out how to even improve this code. I don't really like the approach of simulating the "end" of a stream. Is there something like "detaching" or "rewiring" parts of a pipe-wiring of streams in node?
First, you should create your very own Writable Stream class which itself creates a lame encoding instance. This writable stream will receive the decoded PCM data.
It works like this:
var stream = require('stream');
var util = require('util');
var fs = require('fs');
var lame = require('lame');
var streamifier = require('streamifier');
var WritableStreamBuffer = require("stream-buffers").WritableStreamBuffer;
var SliceStream = function(lameConfig) {
stream.Writable.call(this);
this.encoder = new lame.Encoder(lameConfig);
// we need a stream buffer to buffer the PCM data
this.buffer = new WritableStreamBuffer({
initialSize: (1000 * 1024), // start as 1 MiB.
incrementAmount: (150 * 1024) // grow by 150 KiB each time buffer overflows.
});
};
util.inherits(SliceStream, stream.Writable);
// some attributes, initialization
SliceStream.prototype.writable = true;
SliceStream.prototype.encoder = null;
SliceStream.prototype.buffer = null;
// will be called each time the decoded steam emits "data"
// together with a bunch of binary data as Buffer
SliceStream.prototype.write = function(buf) {
//console.log('bytes recv: ', buf.length);
this.buffer.write(buf);
//console.log('buffer size: ', this.buffer.size());
};
// this method will invoke when the setTimeout function
// emits the simulated "end" event. Lets encode to MP3 again...
SliceStream.prototype.end = function(buf) {
if (arguments.length) {
this.buffer.write(buf);
}
this.writable = false;
//console.log('buffer size: ' + this.buffer.size());
// fetch binary data from buffer
var PCMBuffer = this.buffer.getContents();
// create a stream out of the binary buffer data
streamifier.createReadStream(PCMBuffer).pipe(
// and pipe it right into the MP3 encoder...
this.encoder
);
// but dont forget to pipe the encoders output
// into a writable file stream
this.encoder.pipe(
fs.createWriteStream('./fooBar.mp3')
);
};
Now you can pipe the decoded stream into an instance of your SliceStream class, like this (additional to the other pipes):
icecast.get(streamUrl, function(res) {
var lameEncoderConfig = {
// input
channels: 2, // 2 channels (left and right)
bitDepth: 16, // 16-bit samples
sampleRate: 44100, // 44,100 Hz sample rate
// output
bitRate: 320,
outSampleRate: 44100,
mode: lame.STEREO // STEREO (default), JOINTSTEREO, DUALCHANNEL or MONO
};
var decodedStream = res.pipe(new lame.Decoder());
// pipe decoded PCM stream into a SliceStream instance
decodedStream.pipe(new SliceStream(lameEncoderConfig));
// now play it...
decodedStream.pipe(new Speaker());
setTimeout(function() {
// after 10 seconds, emulate an end of the stream.
res.emit('end');
}, 10 * 1000 /*milliseconds*/)
});
Can I suggest using removeListener after 10 seconds? That will prevent future events from being sent through the listener.
var request = require('request'),
fs = require('fs'),
masterStream = request('-- mp3 stream --')
var writeStream = fs.createWriteStream('recording.mp3'),
handler = function(bit){
writeStream.write(bit);
}
masterStream.on('data', handler);
setTimeout(function(){
masterStream.removeListener('data', handler);
writeStream.end();
}, 1000 * 10);

Resources