stretch audio so it fits in video

stretch audio so it fits in video - audio

I'm trying hard for more than a day to fit an audio into a video, which is almost an hour long and the audio is just 2-3 minutes shorter than that. The video has it'0s own audio but I don't want it since the external audio I got has more quality and more channels.
The video has a 25 fps framerate while the external audio has 23.976.
I've tried ffmpeg but to no avail. Also the audio stretching has to be precise so the sound match people's lips. Is there a way to tell ffmpeg "see this video duration? stretch the audio to that exact duration to the second (or milisecond) and embed it" ?
I've tried MP4box, Avidemux, Handbrake and now FFmpeg (pretty complicated), and my head is exploding :P Anyone that can help me on this? I can't believe it can't be done
Thanks in advance

Related

FFMPEG distorting when resampling audio

I'm making an mp3 from a flac file with ffmpeg. This is usually hum-dum for me.
Tonight, for some reason, the converted audio is distorting when I use the same commands I've always used. After troubleshooting, it appears the problem is the "-out_sample_rate" flag.
My command:
ffmpeg -i input.flac -write_id3v1 1 -id3v2_version 3 -dither_method modified_e_weighted -out_sample_rate 44.1k -b:a 320k output.mp3
The audio in the mp3 is then incredibly distorted by a jacked gain resulting in digital clipping.
I've tried updating ffmpeg, and then problem remains. I've tried converting various sample rates (44.1k source files, 48k source files, 96k source files) to both 44.1k and 48k mp3s, problem remains whenever there's a conversion.
I'm on macOS, and I installed ffmpeg via homebrew.
Any ideas?

Are you sure the distortion comes from resampling?
Even the poorest resampling algorithm doesn't distort. More typical artifacts from poor resampling are harsh high frequencies due to aliasing and quantisation noise.
FFmpeg's resampler isn't the very best but it isn't bad at all. It shouldn't lead to distortion at all. Enough for average use.
How much headroom does the source file have?
If not enough, the resampling or the MP3 conversion may lead to clipping. The MP3 encoder removes frequencies from the signal (even at 320kbps) so the waveform will alter.
So reimport the encoded MP3 into an audio editor and look for clipping.
If not sure, which step the distortion comes from, split the command up and look, which step leads to clipping:
ffmpeg -i input.flac -write_id3v1 1 -id3v2_version 3 -dither_method modified_e_weighted -out_sample_rate 44.1k intermediate.flac
ffmpeg -i intermediate.flac -b:a 320k output.mp3
There should be a headroom of at least 1dB left before it gets converted to MP3. If not, lower the gain before.
If the resampling of the intermediate.flac leads to a significant gain in amplitude, the original input.flac is poorly mastered. If so (and the quality is really important), do the SR conversion in an audio editor (i.e. Audacity, it does a better resampling job than FFMpeg) and apply a limiter between the resampling and dithering step to lower the few strong peaks nicely.
If that doesn't help: What exactly does input.flac contain? Music? Noise? Speech? and is it selfmade or taken from something else?

Making Sound To High To Hear Or Undetecable with Sox/FFMPEG

I want to make a sound that is too high to be detected by the human ear. From my understanding, humans can hear sounds between 20hz and 44000hz.
With sox, I am making a sound that is 50000hz. The problem is I can still hear it. The command I am using is this:
sox -n -r 50000 output.wav rate -L -s 50050 synth 3 sine
Either I have super good hearing or I am doing something wrong. How can I make this sound undetectable with SOX of FFMPEG?

Human hearing is generally considered to range between 20Hz and 20kHz, although most people don't hear much above 16kHz. Digital signals can only represent frequencies up to half of their sampling rate, known as the Nyquist frequency, and so, in order to accurately reproduce audio for the human ear, a sampling rate of at least 40kHz is needed. In practice, a sampling rate of 44.1kHz or 48kHz is almost always used, leaving plenty of space for an inaudable sound somewhere in the 20-22kHz range.
For example, this command generates a WAV file with a sampling rate of 48kHz containing a sine wave at 22kHz that is completely inaudible to me:
sox -n -r 48000 output.wav synth 3 sine 22000
I think part of your problem was that you were using the wrong syntax to specify the pitch to sox. This question has some good information about using SoX to generate simple tones.

How to remove long silent and unchanged video sections with ffmpeg?

I want to identify areas in a .mp4 (H264 + AAC) video that are silent and unchanged frames and cut them out.
Of course there would be some fine-tuning regarding thresholds and algorithms to measure unchanged frames.
My problem is more general, regarding how I would go about automating this?
Is it possible to solve this with ffmpeg? (preferably with C or python)
How can I programatically analyse the audio?
How can I programatically analyse video frames?

For audio silence see this.
For still video scenes ffmpeg might not be the ideal tool.
You could use scene change detection with a low threshold to find the specific frames, then extract those frames and compare them with something like imagemagick's compare function:
ffprobe -show_frames -print_format compact -f lavfi "movie=test.mp4,select=gt(scene\,.1)"
compare -metric RMSE frame1.png frame0.png
I don't expect this to work very well.
Your best bet is to use something like OpenCV to find differences between frames.
OpenCV Simple Motion Detection

Removing same audio sections from audio files

As a collecter I've thousands of audio files which downloaded from podcasting services. All feeds start with a 15 seconds same introduction. That's very annoying for me so I tried crop all of them.
But all of them are not regular. The voiced presentations are the exactly same but some of them...
... are starting at 00:00 or at 00:05 or at any seconds which we don't know
... have not the introduction on startup
I couldn't determine which seconds should crop.
The question: How can we crop the all audio files according to specific audio clip?
In other sayings "detect same part and remove it" ?

As I understand it you already have a way to crop the files at a specific point. So the problem boils down to working out where the intro ends in each clip. Here's how I would do it:
First, manually isolate the intro audio in a separate file/buffer.
For each clip, you need to work out where in the clip the intro audio occurs. Do this by computing a cross-correlation between the intro audio and the main clip. The correct offset will be the one with the highest correlation coefficient. (You could also look for the minimum in a mean-difference, which is equivalent.)
Once you know where the intro audio is, you can calculate your crop position.
There are a few obvious optimisations:
Only search for the intro audio in the first (say) 30 seconds of each clip.
Don't search for the whole intro audio, just the last 1/2 second.
If you're not 100% sure that the audio is there, you might want to set a threshold for acceptance.

how to improve the quality of resultant video when merging audio-video with ffmpeg?

When merging an audio and video with ffmpeg, the quality of the resultant video goes down. How to we improve the quality(esp. the audio quality)?
The audio quality gets degraded to quite an extent with random screeches in between.
I've tried converting the audio to mp3 and video to mp4 before merging them, tried various audio-video parameters like bitrate, sample-rate, qscale etc but still unsuccessful.
Any help would be greatly appreciated!

The -acodec copy command line option should just copy the audio stream and not re-encode it. For the video stream -vcodec copy. Also, you can used the -sameq option for the video stream.
See this answer for a little more detail.

i try using the -sameq, yeah it gives a good quality of the video but what I notice was the file size increases.

Using -q:v 1 applies the best possible resolution. Number between 1 and 35. One is the best quality.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string