How to gain volumes of specific bands of audio files using ffmpeg? - audio

I want increase or decrease volume of specific frequency bands with ffmpeg.
I think bandreject and bandpass filter can do similar thing.
But is there any way to reject 80% of energy of specific bands?
Thanks in advance?

Use the equalizer filter.
Example to attenuate 10 dB at 1000 Hz with a bandwidth of 200 Hz and attenuate 5 dB at 8000 Hz with a bandwidth of 1000 Hz:
ffmpeg -i input.mp3 -af equalizer=frequency=1000:width=200:width_type=h:gain=-10,equalizer=frequency=8000:width=1000:width_type=h:gain=-5 output.wav
Or you can do it in one filter instance using the anequalizer filter.

Related

how many maximum no. of channels in an audio file we can create with FFMPEG amerge filter?

how many maximum no. of channels in an audio file we can create with FFMPEG amerge filter?
We have a requirement to merge multiple single channel audio files into multi channel single audio file.
Each channel represents the speaker in the audio file.
I tried amerge filter and could do it upto 8 files. I am getting blank audio file when I try to do it for 10 audio files, and I think the FFMPEG amerge filter command doesn't produce any error either.
Can I create N no. of multi-channel audio files with N no. of files? Here N may be 100+? Is it possible?
I am new to this audio api etc. so any guidance is appreciated.
how many maximum no. of channels in an audio file we can create with FFMPEG amerge filter? We have a requirement to merge multiple single channel audio files into multi channel single audio file.
Max inputs is 64. According to ffmpeg -h filter=amerge:
inputs <int> ..F.A...... specify the number of inputs (from 1 to 64) (default 2)
Or look at the source code at libavfilter/af_amerge.c and refer to SWR_CH_MAX.
Can I create N no. of multi-channel audio files with N no. of files? Here N may be 100+? Is it possible?
Chain multiple amerge filters with a max of 64 inputs per filter. Or use the amix filter that has a max of 32767.

Librosa.resample() resamples to a lower rate than needed

I am doing some audio pre-processing to train a ML model.
All the audio files of the dataset are:
RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz.
I am using the following snippet of code to resample the dataset to 8000 Hz:
samples, sample_rate = librosa.load(filename, sr = 16000)
samples = librosa.resample(samples, sample_rate, 8000)
then I use the following snippet to reshape the new samples:
samples.reshape(1,8000,1)
but for some reason, I keep getting the following error: ValueError: cannot reshape array of size 4000 into shape (1,8000,1) but the size differs from a file to another, but it's always less than 8000 HZ (the desired sample rate).
I doubled checked the original sample rate and it was 16000 Hz, I also tried to load the files with a sample rate of 8000, but I had no luck.

How to detect two identical audio/video files with different volume level?

I'm working on program that could compare 2 video files and show difference.
I compare audio track of files using SOX and FFMPEG:
invert one of the files (sox)
merge other file and invert version of first (sox)
detect silence (ffmpeg)
But if two file differs only by volume level - all audio track will be detected as non-silent ranges.
How to understand that 2 files have the same audio track, but with different volume level?
I tried to change sound level via sox: sox -v 1.1 input.wav output.wav
And then compare statistical information (-n stat).
It works fine. Result of division parameters audio2/audio1:
Samples read 1.00;
Length (seconds) 1.00;
Scaled by 1.00;
Maximum amplitude 1.10;
Minimum amplitude 1.10;
Midline amplitude 1.10;
Mean norm 1.10;
Mean amplitude 1.00;
RMS amplitude 1.10;
Maximum delta 1.10;
Mean delta 1.10;
RMS delta 1.10;
Rough frequency 1.00;
Volume adjustment 1/1.10;
BUT! When I tried ffmpeg to change volume of video: ffmpeg -i input.mp4 -vcodec copy -af "volume=10dB" output.mp4 (or volume=volume=0.5) and than compared sox audio statistic: I can't find any patterns...
Samples read 1.00
Length (seconds) 1.00
Scaled by 1.00
Maximum amplitude 0.71
Minimum amplitude 0.64
Midline amplitude -2401.73
Mean norm 0.34
Mean amplitude 0.50
RMS amplitude 0.36
Maximum delta 0.37
Mean delta 0.34
RMS delta 0.36
Rough frequency 0.99
Volume adjustment 0.71
I will be grateful for any ideas and help.

How to calculate total convertion duration before converting with FFMPEG in nodeJS

With FFMPEG in nodeJS,
I would like to convert a video with FFMPEG.
How can I calculate total convertion duration before processing the conversion ?
Example : How long time a 1 Go AVI movie takes to be converted in MKV ?
You can't know in advance the exact amount of time needed for executing the conversion.
If you know the total number of frames of the target file you can use this formula:
T_full_conversion_time = T_elapsed * T_total_frame_count/ T_converted_frames
You can use T_full_conversion_time and T_elapsed and estimate the remaining time.

splitting a flac image into tracks

This is a follow up question to Flac samples calculation.
Do I implement the offset generated by that formula from the beginning of the file or after the metadata where the stream starts (here)?
My goal is to programmatically divide the file myself - largely as a learning exercise. My thought is that I would write down my flac header and metadata blocks based on values learned from the image and then the actual track I get from the master image using my cuesheet.
Currently in my code I can parse each metadata block and end up where the frames start.
Suppose you are trying to decode starting at M:S.F = 3:45.30. There are 75 frames (CDDA sectors) per second, and obviously there are 60 seconds per minute. To convert M:S.F from your cue sheet into a sample offset value, I would first calculate the number of CDDA sectors to the desired starting point: (((60 * 3) + 45) * 75) + 30 = 16,905. Since there are 75 sectors per second, assuming the audio is sampled at 44,100 Hz there are 44,100 / 75 = 588 audio samples per sector. So the desired audio sample offset where you will start decoding is 588 * 16,905 = 9,940,140.
The offset just calculated is an offset into the decompressed PCM samples, not into the compressed FLAC stream (nor in bytes). So for each FLAC frame, calculate the number of samples it contains and keep a running tally of your position. Skip FLAC frames until you find the one containing your starting audio sample. At this point you can start decoding the audio, throwing away any samples in the FLAC frame that you don't need.
FLAC also supports a SEEKTABLE block, the use of which would greatly speed up (and alter) the process I just described. If you haven't already you can look at the implementation of the reference decoder.

Resources