Swift How to merge video and audio into a one mp4 file

Swift How to merge video and audio into a one mp4 file - audio

I want to merge video captured by AVCaptureSession() and audio recorded by AVAudioRecorder() into one mp4 file ..
I tried this solution
Swift Merge audio and video files into one video
and this Merging audio and video Swift
but both of them didn't work. They both gave me the same error : Thread1 : Signal SIGABRT
So what is the right way to merge audio and video ?
UPDATE:-
I found where the app crashes every time. It crashes on the code line that tries to get an AVAssetTrack from AVAsset
func merge(audio:URL, withVideo video : URL){
// create composition
let mutableComposition = AVMutableComposition()
// Create the video composition track.
let mutableCompositionVideoTrack : AVMutableCompositionTrack = mutableComposition.addMutableTrack(withMediaType: AVMediaTypeVideo, preferredTrackID: kCMPersistentTrackID_Invalid)
// Create the audio composition track.
let mutableCompositionAudioTrack : AVMutableCompositionTrack = mutableComposition.addMutableTrack(withMediaType: AVMediaTypeAudio, preferredTrackID: kCMPersistentTrackID_Invalid)
// create media assets and tracks
let videoAsset = AVAsset(url: video)
let videoAssetTrack = videoAsset.tracks(withMediaType: AVMediaTypeVideo)[0] // ** always crashes here ** //
let audioAsset = AVAsset(url: audio)
let audioAssetTrack = audioAsset.tracks(withMediaType: AVMediaTypeAudio)[0] // ** always crashes here ** //
....
}

Related

Precise method of segmenting & transcoding video+audio (via ffmpeg), into an on-demand HLS stream?

recently I've been messing around with FFMPEG and streams through Nodejs. My ultimate goal is to serve a transcoded video stream - from any input filetype - via HTTP, generated in real-time as it's needed in segments.
I'm currently attempting to handle this using HLS. I pre-generate a dummy m3u8 manifest using the known duration of the input video. It contains a bunch of URLs that point to individual constant-duration segments. Then, once the client player starts requesting the individual URLs, I use the requested path to determine which time range of video the client needs. Then I transcode the video and stream that segment back to them.
Now for the problem: This approach mostly works, but has a small audio bug. Currently, with most test input files, my code produces a video that - while playable - seems to have a very small (< .25 second) audio skip at the start of each segment.
I think this may be an issue with splitting using time in ffmpeg, where possibly the audio stream cannot be accurately sliced at the exact frame the video is. So far, I've been unable to figure out a solution to this problem.
If anybody has any direction they can steer me - or even a prexisting library/server that solves this use-case - I appreciate the guidance. My knowledge of video encoding is fairly limited.
I'll include an example of my relevant current code below, so others can see where I'm stuck. You should be able to run this as a Nodejs Express server, then point any HLS player at localhost:8080/master to load the manifest and begin playback. See the transcode.get('/segment/:seg.ts' line at the end, for the relevant transcoding bit.
'use strict';
const express = require('express');
const ffmpeg = require('fluent-ffmpeg');
let PORT = 8080;
let HOST = 'localhost';
const transcode = express();
/*
* This file demonstrates an Express-based server, which transcodes & streams a video file.
* All transcoding is handled in memory, in chunks, as needed by the player.
*
* It works by generating a fake manifest file for an HLS stream, at the endpoint "/m3u8".
* This manifest contains links to each "segment" video clip, which browser-side HLS players will load as-needed.
*
* The "/segment/:seg.ts" endpoint is the request destination for each clip,
* and uses FFMpeg to generate each segment on-the-fly, based off which segment is requested.
*/
const pathToMovie = 'C:\\input-file.mp4'; // The input file to stream as HLS.
const segmentDur = 5; // Controls the duration (in seconds) that the file will be chopped into.
const getMetadata = async(file) => {
return new Promise( resolve => {
ffmpeg.ffprobe(file, function(err, metadata) {
console.log(metadata);
resolve(metadata);
});
});
};
// Generate a "master" m3u8 file, which the player should point to:
transcode.get('/master', async(req, res) => {
res.set({"Content-Disposition":"attachment; filename=\"m3u8.m3u8\""});
res.send(`#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=150000
/m3u8?num=1
#EXT-X-STREAM-INF:BANDWIDTH=240000
/m3u8?num=2`)
});
// Generate an m3u8 file to emulate a premade video manifest. Guesses segments based off duration.
transcode.get('/m3u8', async(req, res) => {
let met = await getMetadata(pathToMovie);
let duration = met.format.duration;
let out = '#EXTM3U\n' +
'#EXT-X-VERSION:3\n' +
`#EXT-X-TARGETDURATION:${segmentDur}\n` +
'#EXT-X-MEDIA-SEQUENCE:0\n' +
'#EXT-X-PLAYLIST-TYPE:VOD\n';
let splits = Math.max(duration / segmentDur);
for(let i=0; i< splits; i++){
out += `#EXTINF:${segmentDur},\n/segment/${i}.ts\n`;
}
out+='#EXT-X-ENDLIST\n';
res.set({"Content-Disposition":"attachment; filename=\"m3u8.m3u8\""});
res.send(out);
});
// Transcode the input video file into segments, using the given segment number as time offset:
transcode.get('/segment/:seg.ts', async(req, res) => {
const segment = req.params.seg;
const time = segment * segmentDur;
let proc = new ffmpeg({source: pathToMovie})
.seekInput(time)
.duration(segmentDur)
.outputOptions('-preset faster')
.outputOptions('-g 50')
.outputOptions('-profile:v main')
.withAudioCodec('aac')
.outputOptions('-ar 48000')
.withAudioBitrate('155k')
.withVideoBitrate('1000k')
.outputOptions('-c:v h264')
.outputOptions(`-output_ts_offset ${time}`)
.format('mpegts')
.on('error', function(err, st, ste) {
console.log('an error happened:', err, st, ste);
}).on('progress', function(progress) {
console.log(progress);
})
.pipe(res, {end: true});
});
transcode.listen(PORT, HOST);
console.log(`Running on http://${HOST}:${PORT}`);

I had the same problem as you, and I've managed to fix this issue as i mentioned in the comment by starting the complete HLS transcoding instead of doing manually the segment requested by the client. I'm going to simplify what I've done and also share the link to my github repo where I've implemented this. I did the same as you for generating the m3u8 manifest:
const segmentDur = 4; // Segment duration in seconds
const splits = Math.max(duration / segmentDur); // duration = duration of the video in seconds
let out = '#EXTM3U\n' +
'#EXT-X-VERSION:3\n' +
`#EXT-X-TARGETDURATION:${segmentDur}\n` +
'#EXT-X-MEDIA-SEQUENCE:0\n' +
'#EXT-X-PLAYLIST-TYPE:VOD\n';
for (let i = 0; i < splits; i++) {
out += `#EXTINF:${segmentDur}, nodesc\n/api/video/${id}/hls/${quality}/segments/${i}.ts?segments=${splits}&group=${group}&audioStream=${audioStream}&type=${type}\n`;
}
out += '#EXT-X-ENDLIST\n';
res.send(out);
resolve();
This works fine when you transcode the video (i.e use for example libx264 as video encoder in the ffmpeg command later on). If you use videocodec copy the segments won't match the segmentDuration from my testing. Now you have a choice here, either you start the ffmpeg transcoding at this point when the m3u8 manifest is requested, or you wait until the first segment is requested. I went with the second option since I want to support starting the transcoding based on which segment is requested.
Now comes the tricky part, when the client requests a segment api/video/${id}/hls/<quality>/segments/<segment_number>.ts in my case you have to first check if any transcoding is already active. If a transcoding is active, you have to check if the requested segment has been processed or not. If it has been processed we can simply send the requested segment back to the client. If it hasn't been processed yet (for example because of a user seek action) we can either wait for it (if the latest processed segment is close to the requested) or we can stop the previous transcoding and restart at the newly requested segment.
I'm gonna try to keep this answer as simple as I can, the ffmpeg command I use to achieve the HLS transcoding looks like this:
this.ffmpegProc = ffmpeg(this.filePath)
.withVideoCodec(this.getVideoCodec())
.withAudioCodec(audioCodec)
.inputOptions(inputOptions)
.outputOptions(outputOptions)
.on('end', () => {
this.finished = true;
})
.on('progress', progress => {
const seconds = this.addSeekTimeToSeconds(this.timestampToSeconds(progress.timemark));
const latestSegment = Math.max(Math.floor(seconds / Transcoding.SEGMENT_DURATION) - 1); // - 1 because the first segment is 0
this.latestSegment = latestSegment;
})
.on('start', (commandLine) => {
logger.DEBUG(`[HLS] Spawned Ffmpeg (startSegment: ${this.startSegment}) with command: ${commandLine}`);
resolve();
})
.on('error', (err, stdout, stderr) => {
if (err.message != 'Output stream closed' && err.message != 'ffmpeg was killed with signal SIGKILL') {
logger.ERROR(`Cannot process video: ${err.message}`);
logger.ERROR(`ffmpeg stderr: ${stderr}`);
}
})
.output(this.output)
this.ffmpegProc.run();
Where output options are:
return [
'-copyts', // Fixes timestamp issues (Keep timestamps as original file)
'-pix_fmt yuv420p',
'-map 0',
'-map -v',
'-map 0:V',
'-g 52',
`-crf ${this.CRF_SETTING}`,
'-sn',
'-deadline realtime',
'-preset:v ultrafast',
'-f hls',
`-hls_time ${Transcoding.SEGMENT_DURATION}`,
'-force_key_frames expr:gte(t,n_forced*2)',
'-hls_playlist_type vod',
`-start_number ${this.startSegment}`,
'-strict -2',
'-level 4.1', // Fixes chromecast issues
'-ac 2', // Set two audio channels. Fixes audio issues for chromecast
'-b:v 1024k',
'-b:a 192k',
];
And input options:
let inputOptions = [
'-copyts', // Fixes timestamp issues (Keep timestamps as original file)
'-threads 8',
`-ss ${this.startSegment * Transcoding.SEGMENT_DURATION}`
];
Parameters worth noting is the -start_number in the output options, this basically tells ffmpeg which number to use for the first segment, if the client requests for example segment 500 we want to keep it simple and start the numbering at 500 if we have to restart the transcoding. Then we have the standard HLS settings (hls_time, hls_playlist_type and f). In the inputoptions I use -ss to seek to the requested transcoding, since we know we told the client in the generated m3u8 manifest that each segment was 4 seconds long, we can just seek to 4 * requestedSegment.
You can see in the 'progress' event from ffmpeg I calculate the latest processed segment by looking at the timemark. By converting the timemark to seconds, then adding the applied seek-time for the transcoding we can calculate approximately which segment was just finished by dividing the amount of seconds with the segment duration which I've set to 4.
Now there is a lot more to keep track of than just this, you have to save the ffmpeg processes that you've started so you can check if a segment is finished or not and if a transcoding is active when the segment is requested. You also have to stop already running transcodings if the user requests a segment far in the future so you can restart it with the correct seek time.
The downside to this approach is that the file is actually being transcoded and saved to your file system while the transcoding is running, so you need to remove the files when the user stops requesting segments.
I've implemented this so it handles the things I've mentioned (long seeks, different resolution requests, waiting until segment is finished etc). If you want to have a look at it it's located here: Github Dose, most interesting files are the transcoding class, hlsManger class and the endpoint for the segments. I tried explaining this as good as I can so I hope you can use this as some sort of base or idea on how to move forward.

When should I call SetAudioTrack exactly?

I am starting with the sample LibVLCSharp.WPF.Sample and while it plays my VOB, I cannot change the audio track.
I call SetAudioTrack and it always returns false (and doesn't change). Full VLC lets me change the audio track on the same file just fine.
var media = new Media(_libVLC, clip.GetEvoPath(playlist), FromType.FromPath);
var status = await media.Parse();
_mediaPlayer.Playing += (sender, e) =>
{
bool ok = _mediaPlayer.SetAudioTrack(3);
};
_mediaPlayer.Play(media);
When should I call this, to get it to work? Ideally I would like to set the stream before I start playback, but that doesn't work either.

Can I offline render an audio file with dynamic tempo?

I'm developing a karaoke application.
I try to provide a funny function.
can I use AudioKit to offline render an audio file with time based dynamic tempo value?
Click the below image and you can get it very soon.
image example
And I post some code here.
// I want to change the tempo for bgm audio file dynamically
self.timePitch = AKTimePitch(self.bgmPlayer)
// here I set the initialized rate value to time Pitch
self.timePitch.rate = 1.0
// support iOS10+
self.out = AKOfflineRenderNode()
self.timePitch.connect(to: self.out)
// make the renderer as AudioKit.out
AudioKit.output = self.out
do {
try AudioKit.start()
} catch {
debugPrint(error.localizedDescription)
}
let url = URL(fileURLWithPath: NSTemporaryDirectory() + "output.caf")
// get total duration
let duration = self.duration()
DispatchQueue.global(qos: .background).async {
do {
let avAudioTime = AVAudioTime(sampleTime: 0, atRate:self.out.avAudioNode.inputFormat(forBus: 0).sampleRate)
// start play BGM
self.bgmPlayer.play(at: avAudioTime)
// and render it to an offline file
try self.out?.renderToURL(url, duration: duration)
// **********
// Question:
// Can I change the tempo value when rendering?
// **********
// stop when finished
self.bgmPlayer.stop()
} catch {
debugPrint(error)
}
}

It really depends on how the dynamic tempo is realized - you can send the audio through time/pitch shifting and render the result.

How to capture the first 10 seconds of an mp3 being streamed over HTTP

disclaimer: newbie to nodeJS and audio parsing
I'm trying to proxy a digital radio stream through an expressJS app with the help of node-icecast which works great. I am getting the radio's mp3 stream, and via node-lame decoding the mp3 to PCM and then sending it to the speakers. All of this just works straight from the github project's readme example:
var lame = require('lame');
var icecast = require('icecast');
var Speaker = require('speaker');
// URL to a known Icecast stream
var url = 'http://firewall.pulsradio.com';
// connect to the remote stream
icecast.get(url, function (res) {
// log the HTTP response headers
console.error(res.headers);
// log any "metadata" events that happen
res.on('metadata', function (metadata) {
var parsed = icecast.parse(metadata);
console.error(parsed);
});
// Let's play the music (assuming MP3 data).
// lame decodes and Speaker sends to speakers!
res.pipe(new lame.Decoder())
.pipe(new Speaker());
});
I'm now trying to setup a service to identify the music using the Doreso API. Problem is I'm working with a stream and don't have the file (and I don't know enough yet about readable and writable streams, and slow learning). I have been looking around for a while at trying to write the stream (ideally to memory) until I had about 10 seconds worth. Then I would pass that portion of audio to my API, however I don't know if that's possible or know where to start with slicing 10 seconds of a stream. I thought possibly trying passing the stream to ffmpeg as it has a -t option for duration, and perhaps that could limit it, however I haven't got that to work yet.
Any suggestions to cut a stream down to 10 seconds would be awesome. Thanks!
Updated: Changed my question as I originally thought I was getting PCM and converting to mp3 ;-) I had it backwards. Now I just want to slice off part of the stream while the stream still feeds the speaker.

It's not that easy.. but I've managed it this weekend. I would be happy if you guys could point out how to even improve this code. I don't really like the approach of simulating the "end" of a stream. Is there something like "detaching" or "rewiring" parts of a pipe-wiring of streams in node?
First, you should create your very own Writable Stream class which itself creates a lame encoding instance. This writable stream will receive the decoded PCM data.
It works like this:
var stream = require('stream');
var util = require('util');
var fs = require('fs');
var lame = require('lame');
var streamifier = require('streamifier');
var WritableStreamBuffer = require("stream-buffers").WritableStreamBuffer;
var SliceStream = function(lameConfig) {
stream.Writable.call(this);
this.encoder = new lame.Encoder(lameConfig);
// we need a stream buffer to buffer the PCM data
this.buffer = new WritableStreamBuffer({
initialSize: (1000 * 1024), // start as 1 MiB.
incrementAmount: (150 * 1024) // grow by 150 KiB each time buffer overflows.
});
};
util.inherits(SliceStream, stream.Writable);
// some attributes, initialization
SliceStream.prototype.writable = true;
SliceStream.prototype.encoder = null;
SliceStream.prototype.buffer = null;
// will be called each time the decoded steam emits "data"
// together with a bunch of binary data as Buffer
SliceStream.prototype.write = function(buf) {
//console.log('bytes recv: ', buf.length);
this.buffer.write(buf);
//console.log('buffer size: ', this.buffer.size());
};
// this method will invoke when the setTimeout function
// emits the simulated "end" event. Lets encode to MP3 again...
SliceStream.prototype.end = function(buf) {
if (arguments.length) {
this.buffer.write(buf);
}
this.writable = false;
//console.log('buffer size: ' + this.buffer.size());
// fetch binary data from buffer
var PCMBuffer = this.buffer.getContents();
// create a stream out of the binary buffer data
streamifier.createReadStream(PCMBuffer).pipe(
// and pipe it right into the MP3 encoder...
this.encoder
);
// but dont forget to pipe the encoders output
// into a writable file stream
this.encoder.pipe(
fs.createWriteStream('./fooBar.mp3')
);
};
Now you can pipe the decoded stream into an instance of your SliceStream class, like this (additional to the other pipes):
icecast.get(streamUrl, function(res) {
var lameEncoderConfig = {
// input
channels: 2, // 2 channels (left and right)
bitDepth: 16, // 16-bit samples
sampleRate: 44100, // 44,100 Hz sample rate
// output
bitRate: 320,
outSampleRate: 44100,
mode: lame.STEREO // STEREO (default), JOINTSTEREO, DUALCHANNEL or MONO
};
var decodedStream = res.pipe(new lame.Decoder());
// pipe decoded PCM stream into a SliceStream instance
decodedStream.pipe(new SliceStream(lameEncoderConfig));
// now play it...
decodedStream.pipe(new Speaker());
setTimeout(function() {
// after 10 seconds, emulate an end of the stream.
res.emit('end');
}, 10 * 1000 /*milliseconds*/)
});

Can I suggest using removeListener after 10 seconds? That will prevent future events from being sent through the listener.
var request = require('request'),
fs = require('fs'),
masterStream = request('-- mp3 stream --')
var writeStream = fs.createWriteStream('recording.mp3'),
handler = function(bit){
writeStream.write(bit);
}
masterStream.on('data', handler);
setTimeout(function(){
masterStream.removeListener('data', handler);
writeStream.end();
}, 1000 * 10);

Playing PCM stream from Web Audio API on Node.js

I'm streaming recorded PCM audio from a browser with web audio api.
I'm streaming it with binaryJS (websocket connection) to a nodejs server and I'm trying to play that stream on the server using the speaker npm module.
This is my client. The audio buffers are at first non-interleaved IEEE 32-bit linear PCM with a nominal range between -1 and +1. I take one of the two PCM channels to start off and stream it below.
var client = new BinaryClient('ws://localhost:9000');
var Stream = client.send();
recorder.onaudioprocess = function(AudioBuffer){
var leftChannel = AudioBuffer.inputBuffer.getChannelData (0);
Stream.write(leftChannel);
}
Now I receive the data as a buffer and try writing it to a speaker object from the npm package.
var Speaker = require('speaker');
var speaker = new Speaker({
channels: 1, // 1 channel
bitDepth: 32, // 32-bit samples
sampleRate: 48000, // 48,000 Hz sample rate
signed:true
});
server.on('connection', function(client){
client.on('stream', function(stream, meta){
stream.on('data', function(data){
speaker.write(leftchannel);
});
});
});
The result is a high pitch screech on my laptop's speakers, which is clearly not what's being recorded. It's not feedback either. I can confirm that the recording buffers on the client are valid since I tried writing them to a WAV file and it played back fine.
The docs for speaker and the docs for the AudioBuffer in question
I've been stumped on this for days. Can someone figure out what is wrong or perhaps offer a different approach?
Update with solution
First off, I was using the websocket API incorrectly. I updated above to use it correctly.
I needed to convert the audio buffers to an array buffer of integers. I choose to use Int16Array. Since the given audio buffer has a range in-between 1 and -1, it was as simple as multiplying by the range of the new ArrayBuffer (32767 to -32768).
recorder.onaudioprocess = function(AudioBuffer){
var left = AudioBuffer.inputBuffer.getChannelData (0);
var l = left.length;
var buf = new Int16Array(l)
while (l--) {
buf[l] = left[l]*0xFFFF; //convert to 16 bit
}
Stream.write(buf.buffer);
}

It looks like you're sending your stream through as the meta object.
According to the docs, BinaryClient.send takes a data object (the stream) and a meta object, in that order. The callback for the stream event receives the stream (as a BinaryStream object, not a Buffer) in the first parameter and the meta object in the second.
You're passing send() the string 'channel' as the stream and the Float32Array from getChannelData() as the meta object. Perhaps if you were to swap those two parameters (or just use client.send(leftChannel)) and then change the server code to pass stream to speaker.write instead of leftchannel (which should probably be renamed to meta, or dropped if you don't need it), it might work.
Note that since Float32Array isn't a stream or buffer object, BinaryJS might try to send it in one chunk. You may want to send leftChannel.buffer (the ArrayBuffer behind that object) instead.
Let me know if this works for you; I'm not able to test your exact setup right now.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Swift How to merge video and audio into a one mp4 file - audio

Related

Precise method of segmenting & transcoding video+audio (via ffmpeg), into an on-demand HLS stream?

When should I call SetAudioTrack exactly?

Can I offline render an audio file with dynamic tempo?

How to capture the first 10 seconds of an mp3 being streamed over HTTP

Playing PCM stream from Web Audio API on Node.js

Categories

Resources