FFMPEG Encoding a video from a Readable stream - node.js

I'm facing an issue with the seeked event in Chrome. The issue seems to be due to how the video being seeked is encoded.
The problem seems to occur most frequently when using ytdl-core and piping a Readable stream into an FFMPEG child process.
let videoStream: Readable = ytdl.downloadFromInfo(info, {
...options,
quality: "highestvideo"
});
With ytdl-core in order to get the highest quality you must combine the audio and video. So here is how I am doing it.
const ytmux = (link, options: any = {}) => {
const result = new stream.PassThrough({
highWaterMark: options.highWaterMark || 1024 * 512
});
ytdl.getInfo(link, options).then((info: videoInfo) => {
let audioStream: Readable = ytdl.downloadFromInfo(info, {
...options,
quality: "highestaudio"
});
let videoStream: Readable = ytdl.downloadFromInfo(info, {
...options,
quality: "highestvideo"
});
// create the ffmpeg process for muxing
let ffmpegProcess: any = cp.spawn(
ffmpegPath.path,
[
// supress non-crucial messages
"-loglevel",
"8",
"-hide_banner",
// input audio and video by pipe
"-i",
"pipe:3",
"-i",
"pipe:4",
// map audio and video correspondingly
// no need to change the codec
// output mp4 and pipe
"-c:v",
"libx264",
"-x264opts",
"fast_pskip=0:psy=0:deblock=-3,-3",
"-preset",
"veryslow",
"-crf",
"18",
"-c",
"copy",
"-pix_fmt",
"yuv420p",
"-movflags",
"frag_keyframe+empty_moov",
"-g",
"300",
"-f",
"mp4",
"-map",
"0:v",
"-map",
"1:a",
"pipe:5"
],
{
// no popup window for Windows users
windowsHide: true,
stdio: [
// silence stdin/out, forward stderr,
"inherit",
"inherit",
"inherit",
// and pipe audio, video, output
"pipe",
"pipe",
"pipe"
]
}
);
audioStream.pipe(ffmpegProcess.stdio[4]);
videoStream.pipe(ffmpegProcess.stdio[3]);
ffmpegProcess.stdio[5].pipe(result);
});
return result;
};
I am playing around with tons of different arguments. The result of this video gets uploaded to a Google Bucket. Then when seeking in Chrome I am getting some issues with certain frames, they are not being seeked.
When I pass it through FFMPEG locally and re-encode it, then upload it, I notice there are no issues.
Here is an image comparing the two results when running ffmpeg -i FILE (the one on the left works fine and the differences are minor)
I tried adjusting the arguments in the muxer code and am continuing to try and compare with the re-encoded video. I have no idea why this is happening, something to do with the frames.

Related

Fluent FFMPEG trigger callback on JPEG output each frame

I'm trying to use Fluent FFMPEG in NodeJS to output a cropped frame from a video. I want to trigger an OCR call to Tesseract on every frame that is created. Is there a way in Fluent FFMPEG to listen to each file being created?
Ideally I would like to output each file to a buffer to skip saving it to disk and speed up the Tesseract calls. Any help would be much appreciated!
Here's the code to generate the still frames:
console.time("Process time");
const ffmpeg = require('fluent-ffmpeg')
ffmpeg('test.mp4')
.duration(1)
.videoFilters([
{
filter: 'crop',
options: '1540:1000:250:0'
}
])
.outputOptions('-q:v 2')
.output('images/outimage_%03d.jpeg')
.on('end', function() {
console.log('Finished processing');
console.timeEnd("Process time");
})
.run();

Process attach multiple stdins

I have one process, which can receive two stdin's
this.child = child_process.spawn(
'command',
[ '-', 'pipe:3' ],
{ 'stdio' => [null, null, null, ???unknown] }
);
this.child.stdin.write(data);
this.child.stdio[3]??.write(anotherData); //This is the unknown part.
Is it possible to create two stdins, without creating another child process?
Problem explanation: My problem actually is with ffmpeg, I'm spawning ffmpeg instance with audio input and video input, and they both need separate input pipe's ,e.g., pipe:0 (stdin) is for video, and pipe:3 (another stdin, but for audio data), because pipe:1 (stdout), and pipe:2 (stderr

Interpolate silence in Discord.js stream

I'm making a discord bot with Discord.js v14 that records users' audio as individual files and one collective file. As Discord.js streams do not interpolate silence, my question is how to interpolate silence into streams.
My code is based off the Discord.js recording example.
In essence, a privileged user enters a voice channel (or stage), runs /record and all the users in that channel are recorded up until the point that they run /leave.
I've tried using Node packages like combined-stream, audio-mixer, multistream and multipipe, but I'm not familiar enough with Node streams to use the pros of each to fill in the gaps the cons add to the problem. I'm not entirely sure how to go about interpolating silence, either, whether it be through a Transform (likely requires the stream to be continuous, or for the receiver stream to be applied onto silence) or through a sort of "multi-stream" that swaps between piping the stream and a silence buffer. I also have yet to overlay the audio files (e.g, with ffmpeg).
Would it even be possible for a Readable to await an audio chunk and, if none is given within a certain timeframe, push a chunk of silence instead? My attempt at doing so is below (again, based off the Discord.js recorder example):
// CREDIT TO: https://stackoverflow.com/a/69328242/8387760
const SILENCE = Buffer.from([0xf8, 0xff, 0xfe]);
async function createListeningStream(connection, userId) {
// Creating manually terminated stream
let receiverStream = connection.receiver.subscribe(userId, {
end: {
behavior: EndBehaviorType.Manual
},
});
// Interpolating silence
// TODO Increases file length over tenfold by stretching audio?
let userStream = new Readable({
read() {
receiverStream.on('data', chunk => {
if (chunk) {
this.push(chunk);
}
else {
// Never occurs
this.push(SILENCE);
}
});
}
});
/* Piping userStream to file at 48kHz sample rate */
}
As an unnecessary bonus, it would help if it were possible to check whether a user ever spoke or not to eliminate creating empty recordings.
Thanks in advance.
Related:
Record all users in a voice channel in discord js v12
Adding silent frames to a node js stream when no data is received
After a lot of reading about Node streams, the solution I procured was unexpectedly simple.
Create a boolean variable recording that is true when the recording should continue and false when it should stop
Create a buffer to handle backpressuring (i.e, when data is input at a higher rate than its output)
let buffer = [];
Create a readable stream for which the receiving user audio stream is piped into
// New audio stream (with silence)
let userStream = new Readable({
// ...
});
// User audio stream (without silence)
let receiverStream = connection.receiver.subscribe(userId, {
end: {
behavior: EndBehaviorType.Manual,
},
});
receiverStream.on('data', chunk => buffer.push(chunk));
In that stream's read method, handle stream recording with a 48kHz timer to match the sample rate of the user audio stream
read() {
if (recording) {
let delay = new NanoTimer();
delay.setTimeout(() => {
if (buffer.length > 0) {
this.push(buffer.shift());
}
else {
this.push(SILENCE);
}
}, '', '20m');
}
// ...
}
In the same method, also handle ending the stream
// ...
else if (buffer.length > 0) {
// Stream is ending: sending buffered audio ASAP
this.push(buffer.shift());
}
else {
// Ending stream
this.push(null);
}
If we put it all together:
const NanoTimer = require('nanotimer'); // node
/* import NanoTimer from 'nanotimer'; */ // es6
const SILENCE = Buffer.from([0xf8, 0xff, 0xfe]);
async function createListeningStream(connection, userId) {
// Accumulates very, very slowly, but only when user is speaking: reduces buffer size otherwise
let buffer = [];
// Interpolating silence into user audio stream
let userStream = new Readable({
read() {
if (recording) {
// Pushing audio at the same rate of the receiver
// (Could probably be replaced with standard, less precise timer)
let delay = new NanoTimer();
delay.setTimeout(() => {
if (buffer.length > 0) {
this.push(buffer.shift());
}
else {
this.push(SILENCE);
}
// delay.clearTimeout();
}, '', '20m'); // A 20.833ms period makes for a 48kHz frequency
}
else if (buffer.length > 0) {
// Sending buffered audio ASAP
this.push(buffer.shift());
}
else {
// Ending stream
this.push(null);
}
}
});
// Redirecting user audio to userStream to have silence interpolated
let receiverStream = connection.receiver.subscribe(userId, {
end: {
behavior: EndBehaviorType.Manual, // Manually closed elsewhere
},
// mode: 'pcm',
});
receiverStream.on('data', chunk => buffer.push(chunk));
// pipeline(userStream, ...), etc.
}
From here, you can pipe that stream into a fileWriteStream, etc. for individual purposes. Note that it's a good idea to also close the receiverStream whenever recording = false with something like:
connection.receiver.subscriptions.delete(userId);
As well, the userStream should, too be closed if it's not, e.g, the first argument of the pipeline method.
As a side note, although outside the scope of my original question, there are many other modifications you can make to this. For instance, you can prepend silence to the audio before piping the receiverStream's data to the userStream, e.g, to make multiple audio streams of the same length:
// let startTime = ...
let creationTime;
for (let i = startTime; i < (creationTime = Date.now()); i++) {
buffer.push(SILENCE);
}
Happy coding!

Can I send an arbitrary chunk of a WebM (starting at a byte offset) to a mediaSource buffer to be played?

I'm trying to send only a specific truncated portion of a WebM file, starting from an arbitrary keyframe, from a Node.js server to be played back by a client using MediaSource buffering, but I'm not sure if this is possible or how to go about doing it.
So far this is what I'm trying:
find the byte offsets and sizes of init segment + keyframe clusters using mse_json_manifest from https://github.com/acolwell/mse-tools
concat streams for the init segment and a randomly chosen media segment
send the streams through either an HTTP request or a socket event to the client
It looks like sending the init segment always works, as the HTMl5 video player for the client shows the original file's duration, but it doesn't buffer the concatenated media segments sent afterwards.
Here's the relevant server code:
const merge = (...streams: ReadStream[]) => {
let pass = new PassThrough();
let waiting = streams.length;
for (let stream of streams) {
pass = stream.pipe(pass, {end: false});
stream.once("end", () => --waiting === 0 && pass.emit("end"));
}
return pass;
}
io.on("connection", (socket) => {
const streams = [
fs.createReadStream(file, {
start: audioJson.init.offset,
end: audioJson.init.size,
}),
fs.createReadStream(file, {
start: audioJson.media[150].offset,
})
];
merge(...streams).on("data", (data) => socket.emit("audio-data", data));
});
The client:
const streamVideo = document.getElementById("webmStream");
const mediaSource = new MediaSource();
const streamSource = URL.createObjectURL(mediaSource);
streamVideo.src = streamSource;
const audioMime = `audio/webm; codecs="opus"`;
const videoMime = `video/webm; codecs="vp9"`;
mediaSource.addEventListener("sourceopen", () => {
const audioBuffer = mediaSource.addSourceBuffer(audioMime);
const audioChunks = [];
function appendOrQueueChunk(chunk) {
if (!audioBuffer.updating && !audioChunks.length) {
audioBuffer.appendBuffer(chunk);
} else {
audioChunks.push(chunk);
}
}
socket.on("audio-data", appendOrQueueChunk);
audioBuffer.addEventListener("updateend", () => {
if (audioChunks.length) audioBuffer.appendBuffer(audioChunks.shift());
});
And a snippet of the JSON:
{
"type": "audio/webm;codecs=\"opus\"",
"duration": 93100.000000,
"init": { "offset": 0, "size": 526},
"media": [
{ "offset": 526, "size": 10941, "timecode": 0.000000 },
{ "offset": 11467, "size": 10382, "timecode": 0.260000 },
{ "offset": 21849, "size": 10301, "timecode": 0.520000 },
{ "offset": 32150, "size": 10495, "timecode": 0.780000 },
...
The socket streaming works fine as long as I just emit socket events directly from a fs.ReadStream of the entire WebM file so maybe it has something to do with sending streams in sequence but I feel completely out of my depth and think I'm missing something conceptually.
You don't even need MediaSource for this. The regular video element can stream from your Node.js server via a simple HTTP request. No need for Socket.IO and what not either.
<video src="https://nodejs-stream-server.example.com/something"></video>
I don't know the library you're using, so I'll tell you how I've done this exact task in the past in more generic terms, and maybe you can adapt it or re-implement.
Firstly, when the request for the media stream comes in to your Node.js server, you must send some initialization data. It sounds like you're already doing this successfully. This initialization data is basically everything in the stream up to the first Cluster element.
So, when your encoder starts, be sure to buffer the data up to then so you have it ready to send to new clients.
Next, you can start at an arbitrary Cluster element as long as that Cluster begins with a keyframe (for video). If this isn't working now, I suspect your Clusters aren't starting with keyframes, or there is something otherwise strange about them. In your JSON, you show an audio stream... was that intentional?
I'd recommend reading up on EBML, which is essentially the base container format for Matroska/WebM. Matroska is just a schema of sorts for the EBML document. WebM is just Matroska, but specified to a core set of codecs.
So, yes, in summary I think you have the concept, but it can be simplified.
Some other things you might find helpful:
https://stackoverflow.com/a/45172617/362536

NodeJs: How to pipe two streams into one spawned process stdin (i.e. ffmpeg) resulting in a single output

In order to convert PCM audio to MP3 I'm using the following:
function spawnFfmpeg() {
var args = [
'-f', 's16le',
'-ar', '48000',
'-ac', '1',
'-i', 'pipe:0',
'-acodec', 'libmp3lame',
'-f', 'mp3',
'pipe:1'
];
var ffmpeg = spawn('ffmpeg', args);
console.log('Spawning ffmpeg ' + args.join(' '));
ffmpeg.on('exit', function (code) {
console.log('FFMPEG child process exited with code ' + code);
});
ffmpeg.stderr.on('data', function (data) {
console.log('Incoming data: ' + data);
});
return ffmpeg;
}
Then I pipe everything together:
writeStream = fs.createWriteStream( "live.mp3" );
var ffmpeg = spawnFfmpeg();
stream.pipe(ffmpeg.stdin);
ffmpeg.stdout.pipe(/* destination */);
The thing is... Now I want to merge (overlay) two streams into one. I already found how to do it with ffmpeg: How to overlay two audio files using ffmpeg
But, the ffmpeg command expects two inputs and so far I'm only able to pipe one input stream into the pipe:0 argument. How do I pipe two streams in the spawned command? Would something like ffmpeg -i pipe:0 -i pipe:0... work? How would I pipe the two incoming streams with PCM data (since the command expects two inputs)?
You could use named pipes for this, but that isn't going to work on all platforms.
I would instead do the mixing in Node.js. Since your audio is in normal PCM samples, that makes this easy. To mix, you simply add them together.
The first thing I would do is convert your PCM samples to a common format... 32-bit float. Next, you'll have to decide how you want to handle cases where both channels are running at the same time and both are carrying loud sounds such that the signal will "clip" by exceeding 1.0 or -1.0. One option is to simply cut each channel's sample value in half before adding them together.
Another option, depending on your desired output, is to let it exceed the normal range and pass it to FFmpeg. FFmpeg can take in 32-bit float samples. There, you can apply proper compression/limiting to bring the signal back under clipping before encoding to MP3.

Resources