I am concatenating multiple (max 25) audio files using SoX with
sox first.mp3 second.mp3 third.mp3 result.mp3
which does what it is supposed to; concatenates given files into one file. But unfortunately there is a small time-gap between those files in result.mp3. Is there a way to remove this gap?
I am creating first.mp3, second.mp3 and so on before concatenating them by merging multiple audios(same length/format/rate):
sox -m drums.mp3 bass.mp3 guitar.mp3 first.mp3
How can I check and assure that there is no time-gap added on all those files? (merged and concatenated)
I need to achieve a seamless playback of all the concatenated files (when playing them in browser one after another it works ok).
Thank you for any help.
EDIT:
The exact example (without real file-names) of a command I am running is now:
sox "|sox -m file1.mp3 file2.mp3 file3.mp3 file4.mp3 -p" "|sox -m file1.mp3 file6.mp3 file7.mp3 -p" "|sox -m file5.mp3 file6.mp3 file4.mp3 -p" "|sox -m file0.mp3 file2.mp3 file9.mp3 -p" "|sox -m file1.mp3 file15.mp3 file4.mp3 -p" result.mp3
This merges files and pipes them directly into concatenation command. The resulting mp3 (result.mp3) has an ever so slight delay between concatenated files. Any ideas really appreciated.
The best — though least helpful — way to do this is not to use MP3 files as your source files. WAV, FLAC or M4A files don't have this problem.
MP3s aren't made up of fixed-rate samples, so cropping out a section of an arbitrary length will not work as you expect. Unless the encoder was smart (like lame), there will often be a gap at the start or end of the MP3 file's audio. I did a test with a sample 0.98s long (which is precisely 73½ CDDA frames, and many MP3 encoders use frames for minimum sample lengths). I then encoded the sample with three different MP3 encoders (lame, sox, and the ancient shine), then decoded those files with three decoders (lame, sox, and madplay). Here's how the sample lengths compare to the original:
Enc.→Dec. Length Samples CDDA Frames
----------------- --------- ------- -----------
shine→lame 0.95" 42095 71.5901
shine→madplay 0.97" 42624 72.4898
shine→sox 0.97" 42624 72.4898
lame→lame 0.98" 43218 73.5000
*Original 0.98" 43218 73.5000
sox→sox 0.99" 43776 74.4490
sox→lame 1.01" 44399 75.5085
lame→madplay 1.02" 44928 76.4082
lame→sox 1.02" 44928 76.4082
sox→madplay 1.02" 44928 76.4082
Only the file encoded and decoded by lame ended up the same length (mostly because lame inserts a length tag to correct for these too-short samples, and knows how to decode it). Everything encoded by sox ended up with a tiny gap, no matter what decoder I used. So joining the files will result in tiny clicks.
Your browser is likely mixing and overlapping the source files very slightly so you don't hear the clicks. Gapless playback is hard to do correctly.
This is my guess for your issue:
sox does not add time gap during concatenation,
however it add time-gap in other operations, for instance if you do a conversion before the concatenation.
To find out what happens I suggest you to check all durations of your files at each time (you can use soxi for instance) to see what's going on.
If it doesn't work (the time-gap is added during concatenation), let me please do another guess:
Sox add time gap because your samples at the beginning or at the end of the file are not close to zero.
To solve this, you could use very short fade-in an fade-out on you files.
Moreover, to force sox to output files with a well-defined length, you could use the trim parameter like this:
sox filein.mp3 trim 0 duration fileout.mp3
First you need really check if the start and the end of your files has no silences, i dont know if sox can do it but you need check the energy(rms, dB) of the start and end audio signals and cut start and end silence, to join audio files without gaps you need apply one window function in your signal to works like a fadein/fadeout and then crossfade the beginning of one with the end of the other.
sox provide a splice function to crossfade:
splice [−h|−t|−q] { position[,excess[,leeway]] }
Splice together audio sections. This effect provides two things over simple audio concatenation: a (usually short) cross-fade is applied at the join, and a wave similarity comparison is made to help determine the best place at which to make the join.
Check Documentation here
Related
My machine is running Ubuntu 20 LTS. I want to manipulate the input live audio in real-time. I have achieved pitch shifting using sox. The command being -
sox -t pulseaudio default -t pulseaudio null pitch +1000
and then routing the audio from "Monitor of Nullsink" .
What I actually want to do is, silence randomized parts of the input audio, with a range. What I mean is, randomly mute 1-2s of the input audio.
The final goal of this project will be to write a script that manipulates my voice and makes it seems like my network is bad.
There is no restriction in method of achieving. That is we may use any language, make an extension, directly manipulate the input audio with sox, ffmpeg etc. Anything goes.
Found the solution by using trim in sox. The project can be found in
https://github.com/TathagataRoy1278/Bad_Internet_Audio_Modulator
I want to split an audio file into several equal-length segments using FFmpeg. I want to specify the general segment duration (no overlap), and I want FFmpeg to render as many segments as it takes to go over the whole audio file (in other words, the number of segments to be rendered is unspecified).
Also, since I am not very experienced with FFmpeg (I only use it to make simple file conversions with few arguments), I would like a description of the code you should use to do this, rather than just a piece of code that I won't necessarily understand, if possible.
Thank you in advance.
P.S. Here's the context for why I'm trying to do this:
I would like to sample a song into single-bar loops automatically, instead of having to chop them manually using a DAW. All I want to do is align the first beat of the song to the beat grid in my DAW, and then export that audio file and use it to generate one-bar loops in FFmpeg.
In the future, I will try to do something like a batch command in which one can specify the tempo and key signature, and it will generate the loops using FFmpeg automatically (as long as the loop is aligned to the beat grid, as I've mentioned earlier). 😀
You can use the segment muxer. Basic example:
ffmpeg -i input.wav -f segment -segment_time 2 output_%03d.wav
-f segment indicates that the segment muxer should be used for the output.
-segment_time 2 makes each segment 2 seconds long.
output_%03d.wav is the output file name pattern which will result iin output_000.wav, output_001.wav, output_002.wav, and so on.
Using FFmpeg, I am trying to combine many audio files into one long one, with a crossfade between each of them. To keep the numbers simple, let's say I have 10 input files, each 5 minutes, and I want a 10 second crossfade between each. (Resulting duration would be 48:30.) Assume all input files have the same codec/bitrate.
I was pleasantly surprises to find how simple it was to crossfade two files:
ffmpeg -i 0.mp3 -i 1.mp3 -vn -filter_complex acrossfade=d=10:c1=tri:c2=tri out.mp3
But the acrossfade filter does not allow 3+ inputs. So my naive solution is to repeatedly run ffmpeg, crossfading the previous intermediate output with the next input file. It's not ideal. It leads me to two questions:
1. Does acrossfade losslessly copy the streams? (Except where they're actively crossfading, of course.) Or do the entire input streams get reencoded?
If the input streams are entirely reencoded, then my naive approach is very bad. In the example above (calling acrossfade 9 times), the first 4:50 of the first file would be reencoded 9 times! If I'm combining 50 files, the first file gets reencoded 49 times!
2. To avoid multiple runs and the reencoding issue, can I achieve the many-crossfade behavior in a single ffmpeg call?
I imagine I would need some long filtergraph, but I haven't figured it out yet. Does anyone have an example of crossfading just 3 input files? From that I could automate the filtergraphs for longer chains.
Thanks for any tips!
Say I have a bunch of mp3 files. How would I go about using an audio software command-line tool to decrease the volume completely on one side of the audio file (right), leaving on the left side of the audio file complete? I would then like to save this file to a new mp3 file. This needs to be done entirely over the command line.
As an another approach. Is it possible to use a command line audio file tool to convert a stereo mp3 file to mono, then to merge this mono file with a "silent" track of the same length, creating a left-headphone track with sound and a right-headphone track with silence?
In this SO question, there seems to be a number of approaches to a rather eccentric end goal. In the first possible solution, I just want to decrease the volume of the right side. In the second possible solution, I want to combine a few more common steps to achieve the same end result.
The problems here are that:
I can't find a good command-line tool for modifying audio files, even to do the second approach which should be a more common request.
I'm expecting that I'll first need to convert the mp3 file to wav, using a similar or second tool
This query is eccentric so there aren't many links about it on the web.
Thanks for any help. Audacity would be my go-to normally, but it appears to be GUI only.
SoX lets you do this very easily.
The first case, muted right channel:
sox test.mp3 test-rmuted.mp3 remix 1 0
The second case, summed mono on left channel:
sox test.mp3 test-lmono.mp3 remix 1,2 0
To batch process you could just do a simple for loop.
Muted right channel:
for f in *.mp3
do
basename="${f%.*}"
echo "$basename"
sox "$f" -t wav - remix 1 0 | \
lame --preset standard - "00-${basename}-rmute".mp3
done
Summed mono on left channel only:
for f in *.mp3
do
basename="${f%.*}"
echo "$basename"
sox "$f" -t wav - remix 1,2 0 | \
lame --preset standard - "00-${basename}-lmono".mp3
done
You can forgo LAME and do the encoding with SoX as in the first two examples, but I find this method simpler and more flexible.
As suggested in a comment you should be able to use FFmpeg to process your audio files. Dropping one channel completely would produce a different result than doing conversion to mono first. However, I think either could be achieved with the pan filter in FFMpeg.
https://trac.ffmpeg.org/wiki/AudioChannelManipulation
https://ffmpeg.org/ffmpeg-filters.html#pan
Attenuation of one channel
Decode mp3 file to wav
Create a new stereo wav file using the pan filter 100% to one channel
Encode the resulting wav file to mp3
Mixing both channels evenly in one channel, then attenuating the other channel
Decode mp3 file to wav
Create a new wav file using the pan filter with one channel 50% from left and 50% right, and the other channel with 0 gain
Encode the resulting wav file to mp3
I use the following code to trim, pipe and concatenate my audio files.
sox "|sox audio.wav -p trim 0.000 =15.000" "|sox audio.wav -p trim 15.000" concatenated.wav
One would expect that concatenated.wav will sound identical compared to a.wav.
However, when both files are played simultaneously together, there is a distinct audio shift on concatenated.wav.
Normally this error is acceptable as it is in the milliseconds range. However, as the number of pipe increases (say more than 100), the amount of audio shift increases substantially.
What is the correct method to trim, pipe and concatenate audio files using SoX to prevent this error?
Edit 1: Samples was used instead of milliseconds. Still met the same problem.
The following code was used:
sox "|sox audio.wav -p trim 0s =661500s" "|sox audio.wav -p trim 661500s" concatenated.wav
Wave file sample rate is 44100hz. Sample size is 16 bit.
SoX 14-4-2 was used.
The problem is that sox may lose a few samples at the cut point of the trim command.
I had a similar problem and solved it by cutting not by milliseconds, but by samples, which of course depend on the sample rate.
If your cutpoints are multiples of the used sample rate, you will no longer lose samples and the combined parts will have the exact same length as the original.