I want to identify areas in a .mp4 (H264 + AAC) video that are silent and unchanged frames and cut them out.
Of course there would be some fine-tuning regarding thresholds and algorithms to measure unchanged frames.
My problem is more general, regarding how I would go about automating this?
Is it possible to solve this with ffmpeg? (preferably with C or python)
How can I programatically analyse the audio?
How can I programatically analyse video frames?
For audio silence see this.
For still video scenes ffmpeg might not be the ideal tool.
You could use scene change detection with a low threshold to find the specific frames, then extract those frames and compare them with something like imagemagick's compare function:
ffprobe -show_frames -print_format compact -f lavfi "movie=test.mp4,select=gt(scene\,.1)"
compare -metric RMSE frame1.png frame0.png
I don't expect this to work very well.
Your best bet is to use something like OpenCV to find differences between frames.
OpenCV Simple Motion Detection
Related
I'm making an mp3 from a flac file with ffmpeg. This is usually hum-dum for me.
Tonight, for some reason, the converted audio is distorting when I use the same commands I've always used. After troubleshooting, it appears the problem is the "-out_sample_rate" flag.
My command:
ffmpeg -i input.flac -write_id3v1 1 -id3v2_version 3 -dither_method modified_e_weighted -out_sample_rate 44.1k -b:a 320k output.mp3
The audio in the mp3 is then incredibly distorted by a jacked gain resulting in digital clipping.
I've tried updating ffmpeg, and then problem remains. I've tried converting various sample rates (44.1k source files, 48k source files, 96k source files) to both 44.1k and 48k mp3s, problem remains whenever there's a conversion.
I'm on macOS, and I installed ffmpeg via homebrew.
Any ideas?
Are you sure the distortion comes from resampling?
Even the poorest resampling algorithm doesn't distort. More typical artifacts from poor resampling are harsh high frequencies due to aliasing and quantisation noise.
FFmpeg's resampler isn't the very best but it isn't bad at all. It shouldn't lead to distortion at all. Enough for average use.
How much headroom does the source file have?
If not enough, the resampling or the MP3 conversion may lead to clipping. The MP3 encoder removes frequencies from the signal (even at 320kbps) so the waveform will alter.
So reimport the encoded MP3 into an audio editor and look for clipping.
If not sure, which step the distortion comes from, split the command up and look, which step leads to clipping:
ffmpeg -i input.flac -write_id3v1 1 -id3v2_version 3 -dither_method modified_e_weighted -out_sample_rate 44.1k intermediate.flac
ffmpeg -i intermediate.flac -b:a 320k output.mp3
There should be a headroom of at least 1dB left before it gets converted to MP3. If not, lower the gain before.
If the resampling of the intermediate.flac leads to a significant gain in amplitude, the original input.flac is poorly mastered. If so (and the quality is really important), do the SR conversion in an audio editor (i.e. Audacity, it does a better resampling job than FFMpeg) and apply a limiter between the resampling and dithering step to lower the few strong peaks nicely.
If that doesn't help: What exactly does input.flac contain? Music? Noise? Speech? and is it selfmade or taken from something else?
I want to make a sound that is too high to be detected by the human ear. From my understanding, humans can hear sounds between 20hz and 44000hz.
With sox, I am making a sound that is 50000hz. The problem is I can still hear it. The command I am using is this:
sox -n -r 50000 output.wav rate -L -s 50050 synth 3 sine
Either I have super good hearing or I am doing something wrong. How can I make this sound undetectable with SOX of FFMPEG?
Human hearing is generally considered to range between 20Hz and 20kHz, although most people don't hear much above 16kHz. Digital signals can only represent frequencies up to half of their sampling rate, known as the Nyquist frequency, and so, in order to accurately reproduce audio for the human ear, a sampling rate of at least 40kHz is needed. In practice, a sampling rate of 44.1kHz or 48kHz is almost always used, leaving plenty of space for an inaudable sound somewhere in the 20-22kHz range.
For example, this command generates a WAV file with a sampling rate of 48kHz containing a sine wave at 22kHz that is completely inaudible to me:
sox -n -r 48000 output.wav synth 3 sine 22000
I think part of your problem was that you were using the wrong syntax to specify the pitch to sox. This question has some good information about using SoX to generate simple tones.
Overview:
I have about 1000 MP3 files that I need to perform noise removal on.
I have used Audacity in the past for individual noise removal operations but Audacity will not cut it for this job.
Audacity is unable to perform bulk operations and I don't have the time to perform this manually on 1000s of MP3 files.
A little about the noise:
The noise is similar to white noise but it differs slightly in every MP3 file, so a different noise profile will need to be built for each MP3.
The noise comes from a fan in the background (if you were wondering).
Question:
What is the best way to automate nose removal from the MP3 files?
You could try using Sox. It's a command line application so is scriptable. See here for further info.
When merging an audio and video with ffmpeg, the quality of the resultant video goes down. How to we improve the quality(esp. the audio quality)?
The audio quality gets degraded to quite an extent with random screeches in between.
I've tried converting the audio to mp3 and video to mp4 before merging them, tried various audio-video parameters like bitrate, sample-rate, qscale etc but still unsuccessful.
Any help would be greatly appreciated!
The -acodec copy command line option should just copy the audio stream and not re-encode it. For the video stream -vcodec copy. Also, you can used the -sameq option for the video stream.
See this answer for a little more detail.
i try using the -sameq, yeah it gives a good quality of the video but what I notice was the file size increases.
Using -q:v 1 applies the best possible resolution. Number between 1 and 35. One is the best quality.
I have one doubt regarding audio processing for noise reduction. Is there any free ware and share ware DLLs available for noise reduction in .wav audio files or any sample codes using c#, vb.net or vb?
SoX is a cross-platform command-line application that does audio manipulation. A quick check of the man page reveals that it can do noise reduction (see noiseprof and noisered).
You can use trim in combination with noiseprof to choose a small clip of a larger audio file to use as the noise.
there are different meanings of "noise reduction". a simple implementation is a noise gate, which mutes the audio when the amplitude goes below a threshold. this is good enough for many applications. but a more sophist aced approach wouldcbe to do some frequency-domain analysis, w which is much less trivial. depending on your needs, the first approach is simple enough to roll your own. ymmv.